Course Description

Functions are a fundamental building block of the R language. You’ve probably used dozens (or even hundreds) of functions written by others, but in order to take your R game to the next level, you’ll need to learn to write your own functions. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.

1. A quick refresher

Before we embark, we’ll make sure you’re ready for this course by reviewing some of the prerequisites. You’ll review the syntax of writing a function in R, the basic data types in R, subsetting and writing for loops. It won’t all be review, we’ll introduce you to a few new things that will be helpful throughout the course.

Video: Writing a function in R

Writing a function

Exercise

The function template is a useful way to start writing a function:

my_fun <- function(arg1, arg2) {
  # body
}

my_fun is the variable that you want to assign your function to, arg1 and arg2 are arguments to the function. The template has two arguments, but you can specify any number of arguments, each separated by a comma. You then replace # body with the R code that your function will execute, referring to the inputs by the argument names your specified.

Let’s get to writing functions!

Instructions

  • Using the function template, write a function ratio() that takes arguments x and y and returns their ratio, x / y.
  • Check your function by calling it with the arguments 3 and 4.
# Define ratio() function
ratio <- function(x, y) {
  x / y
}

# Call ratio() with arguments 3 and 4
ratio(3, 4)
## [1] 0.75

Arguments

Exercise

How did you call your function ratio() in the previous exercise? Do you remember the two ways to specify the arguments? (If you have forgotten it might be useful to review the video from Intermediate R)

You probably either did ratio(3, 4), which relies on matching by position, or ratio(x = 3, y = 4), which relies on matching by name.

For functions you and others use often, it’s okay to use positional matching for the first one or two arguments. These are usually the data to be computed on. Good examples are the x argument to the summary functions (mean(), sd(), etc.) and the x and y arguments to plotting functions.

However, beyond the first couple of arguments you should always use matching by name. It makes your code much easier for you and others to read. This is particularly important if the argument is optional, because it has default. When overriding a default value, it’s good practice to use the name.

Notice that when you call a function, you should place a space after a comma, not before (just like in regular English). Using whitespace makes it easier to skim the function for the important components.

Instructions

Here’s a rather cryptic call to mean(). Rewrite the call to follow the best practices we just discussed. Use the natural ordering of the arguments, which you can find by typing ?mean in the console.

# Rewrite the call to follow best practices
mean(c(1:9, NA), trim = 0.1, na.rm = TRUE)
## [1] 5

Function output

Consider the following function that takes a numeric value, x, as input:

f <- function(x) {
  if (TRUE) {
    return(x + 1)
  }
  x
}

What will be the result of calling f(2)?

Possible Answers
  • 2
  • 3
  • An error

Great job! The body of the conditional is always evaluated and the function returns early without ever running x.

Video: Environments

Testing your understanding of scoping (1)

Exercise

The next few questions are designed to test your understanding of scoping. For first timers, these concepts will be hard and you might not get it the first time through. We decided to include it in the first chapter because it’s good to know now, so when something unexpected happens, you know what to come back to and review.

For now, take a careful look at the function featured in each question. Try to predict what the function will return without running it! Being able to reason about a function is an important skill for R programmers.

Consider the following:

y <- 10
f <- function(x) {
  x + y
}

What will f(10) return?

Possible Answers
  • An error
  • 10
  • 15
  • 20

Correct! Because y is not passed in as an argument to the function, R looks outside of the function environment.

Testing your understanding of scoping (2)

Consider the following:

y <- 10
f <- function(x) {
  y <- 5
  x + y
}

What will f(10) return?

Possible Answers
  • An error
  • 10
  • 15
  • 20

Nice one! The value of x is passed in as an argument to the function and the value of y is defined inside of the function.

Testng your understanding of scoping (3)

This one’s a little different. Instead of predicting what the function will return, you need to predict the value of a variable. We run the following in a fresh session in the console:

f <- function(x) {
  y <- 5
  x + y
}
f(5)

Now, what will typing y return?

Possible Answers
  • An error
  • 5
  • 10
  • 20

Correct! Even though y is set equal to 5 within the body of the function, the object does not exist in the global environment.

Video: Data structures

Atomic types of vectors

Which of these is NOT one of the atomic types of vectors in R?

Possible Answers
  • logical
  • integer
  • double
  • character
  • factor

Subsetting lists

There a few ways to subset a list. Throughout the course we’ll mostly use double bracket ([[]]) subsetting by index and by name.

That is, my_list[[1]] extracts the first element of the list my_list, and my_list[["name"]] extracts the element in my_list that is called name. If the list is nested you can travel down the hierarchy by recursive subsetting. For example, my_list[[1]][["name]] is the element called name inside the first element of my_list.

A data frame is just a special kind of list, so you can use double bracket subsetting on data frames too, my_df[[1]] will extract the first column of a data frame and my_df[["name"]] will extract the column named name from the data frame.

I’ve set up a list called tricky_list in your workspace. Use the function typeof() combined with double bracket subsetting to answer the following questions.

Instructions

What type of object is the…

  • 2nd element in tricky_list?
  • element called x in tricky_list?
  • 2nd element inside the element called x in tricky_list?

Note: Use the double bracket notation to extract elements of the list.

tricky_list <- list(
  nums = c(0.5433085, -1.1538035, 1.2599351, 0.3498936, -0.2438708, 1.9490014, -0.6785343, 0.3676652, -1.1371768, 0.8308079), 
  y = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE), 
  x = list("hello", "hi!", "goodbye!", "bye!"), 
  model = lm(formula = mpg ~ wt, data = mtcars)
)

# 2nd element in tricky_list
typeof(tricky_list[[2]])
## [1] "logical"
# Element called x in tricky_list
typeof(tricky_list[["x"]])
## [1] "list"
# 2nd element inside the element called x in tricky_list
typeof(tricky_list[["x"]][[2]])
## [1] "character"

Exploring lists

Exercise

Often you won’t know exactly what is inside a list. But, you’ll need to figure it out to get some useful piece of data. Extracting elements from the output of the names() and str() functions is a great way to explore the structure of a list.

Calling names() on a list will give you names at the top level of the list and str() will give you a full description of the entire list (which can sometimes be a little overwhelming).

tricky_list has a regression model stored in it. Let’s see if we can drill down and pull out the slope estimate corresponding to the wt variable.

Instructions

  • Use names() on tricky_list to guess where the regression model is stored.
  • Use names() and str() on the model element of tricky_list.
  • Extract the coefficients element of the model element of tricky_list.
  • Extract the wt element of the coefficients element of the model element of tricky_list.

Note: Use the double bracket notation to extract elements of the list.

# Guess where the regression model is stored
names(tricky_list)
## [1] "nums"  "y"     "x"     "model"
# Use names() and str() on the model element
names(tricky_list$model)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
str(tricky_list$model)
## List of 12
##  $ coefficients : Named num [1:2] 37.29 -5.34
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
##  $ residuals    : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
##   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##  $ effects      : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
##   ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
##   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "wt"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.18 1.05
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 30
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = mpg ~ wt, data = mtcars)
##  $ terms        :Classes 'terms', 'formula'  language mpg ~ wt
##   .. ..- attr(*, "variables")= language list(mpg, wt)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "mpg" "wt"
##   .. .. .. ..$ : chr "wt"
##   .. ..- attr(*, "term.labels")= chr "wt"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(mpg, wt)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
##  $ model        :'data.frame':   32 obs. of  2 variables:
##   ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##   ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt
##   .. .. ..- attr(*, "variables")= language list(mpg, wt)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
##   .. .. .. .. ..$ : chr "wt"
##   .. .. ..- attr(*, "term.labels")= chr "wt"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(mpg, wt)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
##  - attr(*, "class")= chr "lm"
# Subset the coefficients element
tricky_list$model$coefficients
## (Intercept)          wt 
##   37.285126   -5.344472
# Subset the wt element
tricky_list$model$coefficients[["wt"]]
## [1] -5.344472

Excellent work! Another way is to use the coef() function on the model element of tricky_list.

Video: for loops

A safer way to create the sequence

Exercise

Let’s take a look at the sequence component of our for loop:

i in 1:ncol(df)

Each time our for loop iterates, i takes the next value in 1:ncol(df). This is a pretty common model for a sequence: a sequence of consecutive integers designed to index over one dimension of our data.

What might surprise you is that this isn’t the best way to generate such a sequence, especially when you are using for loops inside your own functions. Let’s look at an example where df is an empty data frame:

df <- data.frame()
1:ncol(df)

for (i in 1:ncol(df)) {
  print(median(df[[i]]))
}

Our sequence is now the somewhat non-sensical: 1, 0. You might think you wouldn’t be silly enough to use a for loop with an empty data frame, but once you start writing your own functions, there’s no telling what the input will be.

A better method is to use the seq_along() function. This function generates a sequence along the index of the object passed to it, but handles the empty case much better.

Instructions

  • Replace the 1:ncol(df) sequence in our for loop with seq_along(df).
  • Change the value of df to an empty data frame.
  • Repeat the for loop to verify there is no error.
df1 <- data.frame(a = c(-1.26272674, 0.05566754, 1.10385911, -0.74148033, -1.06092338, 1.30082893, 1.05572799, 0.50386200, 2.96581138, 0.09361089), b = c(1.03361615, -0.67150813, -0.71684510, -2.02185287, -0.59079083, -1.09392460, 0.01939328, -0.94027939, 0.28912993, -0.21311041), c = c(1.3972461, -0.4300333, -0.8542174, -0.2505189, -1.6230256, 0.3762698, -3.0804333, -0.7753514, -2.4670001, -2.4857207), d = c(0.864732549, -0.779882564, -0.937921427, 0.106842685, -0.503514794, 0.126978932, 0.007054644, -0.554996198, 0.260304380, 1.027297975))

# Replace the 1:ncol(df) sequence
for (i in seq_along(df1)) {
  print(median(df1[[i]]))
}
## [1] 0.2987364
## [1] -0.6311495
## [1] -0.8147844
## [1] 0.05694866
# Change the value of df
df <- data.frame()

# Repeat for loop to verify there is no error
for (i in seq_along(df)) {
  print(median(df[[i]]))
}

Keeping output

Exercise

Our for loop does a good job displaying the column medians, but we might want to store these medians in a vector for future use.

Before you start the loop, you must always allocate sufficient space for the output, let’s say an object called output. This is very important for efficiency: if you grow the for loop at each iteration (e.g. using c()), your for loop will be very slow.

A general way of creating an empty vector of given length is the vector() function. It has two arguments: the type of the vector ("logical", "integer", "double", "character", etc.) and the length of the vector.

Then, at each oteration of the loop you must store the output in the corresponding entry of the output vector, i.e. assign the result to output[[i]]. (You might ask why we are using double brackets here when output is a vector. It’s primarily for generalizability: this subsetting will wotk whether output is a vector or a list.)

Let’s edit our loop to store the medians, rather than printing them to the console.

Instructions

  • Create a new "double" vector of length ncol(df) and name it output.
  • Inside the body of the loop, store the result of each iteration result in the ith element of output.
  • Print output to the console.
df2 <- data.frame(a = c(-2.023584106, 0.095603871, -0.108814482, -1.562952654, -0.001695936, -0.662148862, -1.453288540, -1.224299431, 0.095396832, -0.787936455), b = c(2.1864750, -1.1326362, -1.4270034, -1.3547144, -0.9374406, 2.2645230, -1.8042590, -0.9469231, -1.1271026, 0.1081373), c = c(-0.5798420, 0.1221429, -1.0678397, 1.0096803, 0.1396902, -0.7214353, 1.4382972, 0.3411223, -0.5589626, 0.3901412), d = c(-0.4541943, -1.1309516, -0.9768001, -1.2112622, -1.1094934, -0.3615033, -0.9000185, 0.6892541, 0.1882872, -1.1494923))

# Create new double vector: output
output <- vector("double", ncol(df2))

# Alter the loop
for (i in seq_along(df2)) {
  # Change code to store result in output
  output[[i]] <- median(df2[[i]])
}

# Print output
output
## [1] -0.7250427 -1.0370128  0.1309165 -0.9384093

2. When and how you should write a function

Writing your own functions is one way to reduce duplication in your code. In this chapter, you’ll learn when to write a function, how to get started and what to keep in mind when you are writing. You’ll also learn to appreciate that functions have two audiences: the computer (which runs the code) and humans (who need to be able to understand the code).

Video: Why should you write a function?

Start with a snippet of code

Exercise

We have a snippet of code that successfully rescales a column to be between 0 and 1:

(df$a - min(df$a, na.rm = TRUE)) /  
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

Our goal over the next few exercises is to turn this snippet, written to work on the a column in the data frame df, into a general purpose rescale01() function that we can apply to any vector.

The first step of turning a snippet into a function is to examine the snippet and decide how many inputs there are, then rewrite the snippet to refer to these inputs using temporary names. These inputs will become the arguments to our function, so choosing good names for them is important. (We’ll talk more about naming arguments in a later exercise.)

In this snippet, there is one input: the numeric vector to be rescaled (currently df$a). What would be a good name for this input? It’s quite common in R to refer to a vector of data simply as x (like in the mean function), so we will follow that convention here.

Instructions

  • To help you test your code, create a vector x containing the numbers 1 through 10.
  • Rewrite the code snippet to use the temporary variable x instead of referring to the data frame column df$a.
# Define example vector x
x <- c(1:10)

# Rewrite this snippet to refer to x
(x - min(x, na.rm = TRUE)) /
  (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000

Rewrite for clarity

Exercise

Our next step is to examine our snippet and see if we can write it more clearly.

Take a close look at our rewritten snippet. Do you see any duplication?

(x - min(x, na.rm = TRUE)) /
  (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))

One obviously duplicated statement is min(x, na.rm = TRUE). It makes more sense for us just to calculate it once, store the result, and then refer to it when needed. In fact, since we also need the maximum value of x, it would be even better to calculate the range once, then refer to the first and second elements when they are needed.

What should we call this intermediate variable? You’ll soon get the message that using good names is an important part of writing clear code! I suggest we call it rng (for “range”).

Instructions

  • Define the intermediate variable rng to contain the range of x using the function range(). Specify the na.rm() argument to automatically ignore any NAs in the vector.
  • Rewrite the snippet to refer to this intermediate variable.
# Define example vector x
x <- 1:10

# Define rng
rng <- range(x, na.rm = TRUE)

# Rewrite this snippet to refer to the elements of rng
(x - rng[1]) /
  (rng[2] - rng[1])
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000

Finally turn it into a function!

Exercise

What do you need to write a function? You need a name for the function, you need to know the arguments to the function, and you need code that forms the body of the function.

We now have all these pieces ready to put together. It’s time to write the function!

Instructions

  • Use the function template to create a function called rescale01():
  • The function should take a single argument x.
  • The body of the function should be our rewritten snippet from the previous exercise: rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])
  • Once you’ve written the function, test it out by calling it on the x we’ve already defined.
# Define example vector x
x <- 1:10 

# Use the function template to create the rescale01 function
rescale01 <- function(x) {
  # body
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

# Test your function, call rescale01 using the vector x as the argument
rescale01(x)
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000

Video: How should you write a function?

Start with a simple problem

Exercise

Let’s tackle a new problem. We want to write a function, both_na() that counts at how many positions two vectors, x and y, both have a missing value.

How do we get started? Should we start writing our function?

both_na <- function(x, y) {
  # something goes here?
}

No! We should start by solving a simple example problem first. Let’s define an x and y where we know what the answer both_na(x, y) should be.

Let’s start with:

x <- c( 1, 2, NA, 3, NA)
y <- c(NA, 3, NA, 3,  4)

Then both_na(x, y) should return 1, since there is only one element that is missing in both x and y, the third element.

(Notice we introduced a couple of extra spaces to each vector. Adding spaces to x and y to make them match up makes it much easier to see what the correct result is. Code formatting is an important aspect of writing clear code.)

Your first task is to write a line of code that gets to that answer.

Instructions

Write a line of code that finds the number of positions where both x and y have missing values. You may find the is.na() and sum() functions useful, as well as the & operator.

# Define example vectors x and y
x <- c( 1, 2, NA, 3, NA)
y <- c(NA, 3, NA, 3,  4)

# Count how many elements are missing in both x and y
sum(is.na(x) & is.na(y))
## [1] 1

Rewrite snippet as function

Exercise

Great! You now have a snippet of code that works:

sum(is.na(x) & is.na(y))

You’ve already figured out it should have two inputs (the two vectors to compare) and we’ve even given them reasonable names: x and y. Our snippet is also so simple we can’t write it any clearer.

So, let’s go ahead and turn this snippet into a function!

Instructions

  • This time, instead of using the function template, we’ve provided the body of the function. Wrap the body with function assignment and curly braces.
  • Assign the function to the name both_na().
  • both_na() should take two arguments, x and y.
# Define example vectors x and y
x <- c( 1, 2, NA, 3, NA)
y <- c(NA, 3, NA, 3,  4)

# Turn this snippet into a function: both_na()
both_na <- function(x, y) {
  sum(is.na(x) & is.na(y))
}

Put our function to use

Exercise

We have a function that works in at least one situation, but we should probably check it works in others.

Consider the following vectors:

x <-  c(NA, NA, NA)
y1 <- c( 1, NA, NA)
y2 <- c( 1, NA, NA, NA)

What would you expect both_na(x, y1) to return? What about both_na(x, y2)? Does your function return what you expected? Try it and see!

Instructions

  • Create the vectors x, y1 and y2 according the above examples.
  • Call both_na() with x and y1.
  • Call both_na() with x and y2.
# Define x, y1 and y2
x <- c(NA, NA, NA)
y1 <- c(1, NA, NA)
y2 <- c(1, NA, NA, NA)

# Call both_na on x, y1
both_na(x, y1)
## [1] 2
# Call both_na on x, y2
both_na(x, y2)
## Warning in is.na(x) & is.na(y): longer object length is not a multiple of
## shorter object length
## [1] 3

Video: How can you write a good function?

Good function names

Consider the following function, f2(), which has also been loaded into your workspace:

f2 <- function(x) {
  if (length(x) <= 1) return(NULL)
  x[-length(x)]
}

f2() isn’t a very good function name! Which of the following would make a good name for this function?

Possible Answers
  • rmv()
  • remove_first()
  • shorten()
  • removed_last()
  • remove_last()

Perfect! It’s a verb, and descriptive. Can you brainstorm other good names for this function?

Argument names

Exercise

It’s not just your function that needs a good name. Your arguments need good names too!

Take a look at this function, which calculates a confidence interval for a population mean:

mean_ci <- function(c, nums) {
  se <- sd(nums) / sqrt(length(nums))
  alpha <- 1 - c
  mean(nums) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

The argument nums is a sample of data and the argument c controls the level of the confidence interval. For example, if c = 0.95, we get a 95% confidence interval. Are c and nums good arguments names?

c is a particularly bad name. It is completely non-descriptive and it’s the name of an existing function in R. What might be better? Maybe something like confidence, since it reveals the purpose of the argument: to control the level of confidence for the interval. Another option might be level, since it’s the same name used for the confint function in base R and your users may already be familiar with that name for this parameter.

nums is not inherently bad, but since it’s the placeholder for the vector of data, a name like x would be more recognizable to users.

Instructions

Rewrite the mean_ci() function to take arguments named level and x instead of c and nums, in that order for now.

# Rewrite mean_ci to take arguments named level and x
mean_ci <- function(level, x) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - level
  mean(x) + se * qnorm(c(alpha / 2, level / 2))
}

Argument order

Exercise

Aside from giving your arguments good names, you should put some thought into what order your arguments are in and if they should have defaults or not.

Arguments are often one of two types:

  • Data arguments supply the data to compute on.
  • Detail arguments control the details of how the computation is done.

Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values.

Take another look at our function:

mean_ci <- function(level, x) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - level
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

x is the data to be used for the calculation, so it’s a data argument and should come first.

level is a detail argument; it just controls the level of confidence for which the interval is constructed. It should come after the data arguments and we should give it a default value, say 0.95, since it is common to want a 95% confidence interval.

Instructions

  • Move the data argument, x, to the front.
  • Move the detail argument, level, to the end and give it the default value 0.95.
# Alter the arguments to mean_ci
mean_ci <- function(x, level = 0.95) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - level
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

Reurn statements

Exercise

One of your colleagues has noticed if you pass mean_ci() an empty vector it returns a confidence interval with missing values at both ends (try it: mean_ci(numeric(0))). In this case, they decided it would make more sense to produce a warning "x was empty" and return c(-Inf, Inf) and have edited the function to be:

mean_ci <- function(x, level = 0.95) {
  if (length(x) == 0) {
    warning("`x` was empty", call. = FALSE)
    interval <- c(-Inf, Inf)
  } else { 
    se <- sd(x) / sqrt(length(x))
    alpha <- 1 - level
    interval <- mean(x) + 
      se * qnorm(c(alpha / 2, 1 - alpha / 2))
  }
  interval
}

Notice how hard it is now to follow the logic of the function. If you want to know what happens in the empty x case, you need to read the entire function to check if anything happens to interval before the function returns. There isn’t much to read in this case, but if this was a longer function you might be scrolling through pages of code.

This is a case where an early return() makes sense. If x is empty, the function should immediately return c(-Inf, Inf).

Instructions

Edit the mean_ci function using an if statement to check for the case when x is empty and if so, to produce the same warning as the code above then immediately return() c(-Inf, Inf).

# Alter the mean_ci function
mean_ci <- function(x, level = 0.95) {

  if (length(x) == 0) {
    warning("`x` was empty", call. = FALSE)
    return(c(-Inf, Inf))
  }
  
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - level
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

What does this function do?

Exercise

Over the next few exercises, we’ll practice everything you’ve learned so far. Here’s a poorly written function, which is also available in your workspace:

f <- function(x, y) {
  x[is.na(x)] <- y
  cat(sum(is.na(x)), y, "\n")
  x
}

Your job is to turn it in to a nicely written function.

What does this function do? Let’s try to figure it out by passing in some arguments.

Instructions

  • Define a numeric vector x with the values: 1, 2, NA, 4 and 5.
  • Call f() with the arguments x = x, and y = 3.
  • Call f() with the arguments x = x, and y = 10.
# Define a numeric vector x with the values 1, 2, NA, 4 and 5
x <- c(1, 2, NA, 4, 5)

# Call f() with the arguments x = x and y = 3
f(x=x, y=3)
## 0 3
## [1] 1 2 3 4 5
# Call f() with the arguments x = x and y = 10
f(x=x, y=10)
## 0 10
## [1]  1  2 10  4  5

Let’s make it clear from its name

Exercise

Did you figure out what f() does? f() takes a vector x and replaces any missing values in it with the value y.

Imagine you came across the line df$z <- f(df$z, 0) a little further on in the code. What does that line do? Now you know, it replaces any missing values in the column df$z with the value 0. But anyone else who comes across that line is going to have to go back and find the definition of f and see if they can reason it out.

Let’s rename our function and arguments to make it obvious to everyone what we are trying to achieve.

Instructions

  • Rename the function f() to replace_missings().
  • Change the name of the y argument to replacement.
  • Now replace the missing values of df$z with 0’s using your new function. Make sure you assign the result back to df$z.
# Rename the function f() to replace_missings()
replace_missings <- function(x, replacement) {
  # Change the name of the y argument to replacement
  x[is.na(x)] <- replacement
  cat(sum(is.na(x)), replacement, "\n")
  x
}

# Rewrite the call on df$z to match our new names
df$z <- replace_missings(df$z, 0)
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'NULL'
## 0 0

Make the body more understandable

Exercise

Great! Now when someone comes across

df$z <- replace_missings(df$z, replacement = 0)

in your code, it’s pretty obvious what you are trying to achieve. The body of our replace_missings() function is still a little messy. There is certainly some unnecessary duplication.

Instructions

  • Define is_miss, a logical vector that identifies the missing values in x.
  • To reduce unncessary duplication, alter the rest of function to refer to is_miss instead of is.na(x).
replace_missings <- function(x, replacement) {
  # Define is_miss
  is_miss <- is.na(x)
  
  # Rewrite rest of function to refer to is_miss
  x[is_miss] <- replacement
  cat(sum(is_miss), replacement, "\n")
  x
}

Much better! But a few more tweaks…

Exercise

Did you notice replace_missings() prints some output to the console? Try it:

replace_missings(df$z, replacement = 0)

That output isn’t exactly self-explanatory. It would be much nicer to say “2 missing values replaced by the value 0”.

It is also bad practice to use cat() for anything other than a print() method (a function designed just to display output). Having an important message just print to the screen makes it very hard for other people who might be programming with your function to capture the output and handle it appropriately.

The official R way to supply simple diagnostic information is the message() function. The unnamed arguments are pasted together with no separator (and no need for a newline at the end) and by default are printed to the screen.

Let’s make our function nicer for users by using message() and making the output self-contained.

Instructions

  • Rewrite to use message() and output sum(is_miss) missings replaced by the value replacement.
  • Check your new function by replacing all the missing values of df$z with 0’s, as we’ve done before.
replace_missings <- function(x, replacement) {
  is_miss <- is.na(x)
  x[is_miss] <- replacement
  
  # Rewrite to use message()
  message(sum(is_miss), " missings replaces by the value ", replacement)
  x
}

# Check your new function by running on df$z
replace_missings(df$z, replacement = 0)
## 0 missings replaces by the value 0
## numeric(0)

3. Functional programming

You already know how to use a for loop. The goal of this chapter is to teach you how to use the map functions in the purrr package which remove the code that’s duplicated across multiple for loops. After completing this chapter you’ll be able to solve new iteration problems with greater ease (faster and with fewer bugs).

Video: Why functional programming?

Using a for loop to remove duplication

Exercise

Imagine we have a data frame called df:

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

We want to compute the median of each column. You could do this with copy and paste (df is available in your workspace, so go ahead and try it in the console):

median(df[[1]])
median(df[[2]])
median(df[[3]])
median(df[[4]])

But that’s a lot of repetition! Let’s start by seeing how we could reduce the duplication by using a for loop.

Instructions

We’ve provided some code to get you started. Fill in the body of the for loop to calculate the median of each column and store the results in output.

# Initialize output vector
output <- vector("double", ncol(df))  

# Fill in the body of the for loop
for (i in seq_along(df)) {            
  output[i] <- median(df[[i]])
}

# View the result
output
## [1] NA

Turning the for loop into a function

Exercise

Now, imagine you need to do this to another data frame df2. You copy and paste the for loop, and edit every reference to df to be df2 instead.

output <- vector("double", ncol(df2))  
for (i in seq_along(df2)) {            
  output[[i]] <- median(df2[[i]])      
}
output

And then you realize you have another data frame df3 for which you also want the column medians. You copy and paste…and realize you’ve copied and pasted two times. Time to write a function!

Instructions

Turn the for loop snippet into a function called col_median() that takes one argument df and returns the vector of column medians.

# Turn this code into col_median()
output <- vector("double", ncol(df))
col_median <- function(df) {
  for (i in seq_along(df)) {            
    output[[i]] <- median(df[[i]])      
  }
  output
}

col_median
## function(df) {
##   for (i in seq_along(df)) {            
##     output[[i]] <- median(df[[i]])      
##   }
##   output
## }

What about column means?

Exercise

What if instead of medians of every column you actually want means?

Let’s write a col_mean() function that returns the vector of column means.

Instructions

  • We have copied and pasted the col_median() function into the editor.
  • Create a col_mean() function by editing col_median() to find the column means instead.
# Create col_mean() function to find column means
col_mean <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- mean(df[[i]])
  }
  output
}

What about column standard deviations?

Exercise

You now have functions for column medians and means, what about one for standard deviations?

col_median <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- median(df[[i]])
  }
  output
}

col_mean <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- mean(df[[i]])
  }
  output
}

Instructions

  • Copy and paste the col_median function into the editor.
  • Edit col_median function to find the column standard deviations instead.
  • Use the name col_sd for your new function.
# Define col_sd() function
col_sd <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- sd(df[[i]])
  }
  output
}

Uh oh…time to write a function again

Exercise

We just copied and pasted the function col_median two times. That’s a sure sign we need to write a function. How can we write a function that will take column summaries for any summary function we provide?

Let’s look at a simpler example first. Consider the functions f1(), f2() and f3() that take a vector x and return deviations from the mean value raised to the powers 1, 2, and 3 respectively:

f1 <- function(x) abs(x - mean(x)) ^ 1
f2 <- function(x) abs(x - mean(x)) ^ 2
f3 <- function(x) abs(x - mean(x)) ^ 3

How could you remove the duplication in this set of function definitions?

Hopefully, you would suggest writing a single function with two arguments: x and power. That way, one function reproduces all the functionality of f1(), f2() and f3(), and more.

Instructions

  • Edit the function f() to take a second argument power.
  • Edit the body of f() so that the absolute deviations raised to power are returned.
# Add a second argument called power
f <- function(x, power) {
    # Edit the body to return absolute deviations raised to power
    abs(x - mean(x)) ** power
}

Video: Functions can be arguments too

Using a function as an argument

Exercise

You just saw that we can remove the duplication in our set of summary functions by requiring the function doing the summary as an input. This leads to creating the col_summary function:

col_summary <- function(df, fun) {
  output <- vector("numeric", length(df))
  for (i in seq_along(df)) {
    output[[i]] <- fun(df[[i]])
  }
  output
}

It may be kind of surprising that you can pass a function as an argument to another function, so let’s verify first that it worked. We’ve found the column means and medians using our old col_mean() and col_median() functions. Your job is to repeat the calculations using col_summary() instead and verify that it works.

Not only does col_summary() remove the duplication in the functions we’ve already written, it also allows us to apply any summary to the columns of a data frame. Verify this, by finding the column interquartile ranges using the function IQR(). (For more info, see ?IQR.)

Instructions

  • Find the column medians of df by specifying median as the fun argument to col_summary().
  • Find the column means of df by specifying mean as the fun argument to col_summary().
  • Find the column interquartile ranges of df using col_summary().
# Find the column medians using col_median() and col_summary()
col_median(df)
## [1] NA
col_summary(df, median)
## [1] NA
# Find the column means using col_mean() and col_summary()
col_mean(df)
## [1] NaN
col_summary(df, mean)
## [1] NaN
# Find the column IQRs using col_summary()
col_summary(df, IQR)
## [1] NA

Nice! If you’ve used the lapply() or sapply() functions before, then you’ve passed functions as arguments. If not, you may want to review them in the Intermediate R course before moving on to the next video.

Video: Introducing purrr

The map functions

Exercise

All the map functions in purrr take a vector, .x, as the first argument, then return .f applied to each element of .x. The type of object that is returned is determined by function suffix (the part after _):

  • map() returns a list or data frame
  • map_lgl() returns a logical vector
  • map_int() returns a integer vector
  • map_dbl() returns a double vector
  • map_chr() returns a character vector

Let’s repeat our column summaries using a map function instead of our col_summary() function.

Which map function shall we use? In our case, every summary we calculated returned a single numeric value, so we’ll use map_dbl().

Instructions

Use map_dbl() to find the…

  • Column means of the data frame df.
  • Column medians of the data frame df.
  • Column standard deviations of the data frame df.
# Load the purrr package
library(purrr)
## Warning: package 'purrr' was built under R version 3.4.4
# Use map_dbl() to find column means
map_dbl(df, mean)
##   z 
## NaN
# Use map_dbl() to column medians
map_dbl(df, median)
##  z 
## NA
# Use map_dbl() to find column standard deviations
map_dbl(df, sd)
##  z 
## NA

The … argument to the map functions

Exercise

The map functions use the ... (“dot dot dot”) argument to pass along additional arguments to .f each time it’s called. For example, we can pass the trim argument to the mean() function:

map_dbl(df, mean, trim = 0.5)

Multiple arguments can be passed along using commas to separate them. For example, we can also pass the na.rm argument to mean():

map_dbl(df, mean, trim = 0.5, na.rm = TRUE)

You don’t have to specify the arguments by name, but it is good practice!

You may be wondering why the arguments to map() are .x and .f and not x and f? It’s because .x and .f are very unlikely to be argument names you might pass through the ..., thereby preventing confusion about whether an argument belongs to map() or to the function being mapped.

Let’s get a bit of practice with this. We’ll apply our new knowledge to a subset of the planes data frame available in the nycflights13 package. Use map_dbl() to find the average and 5th percentile of each column in planes.

Instructions

  • Find the column means of planes by combining`map_dbl() with mean().
  • Find the column means of planes again, but this time exclude missing values from the calculation.
  • Find the 5th percentile of each column in planes by combining map_dbl with quantile(). Don’t forget to exclude missing values!
planes <- data.frame(year = c(1956, 1975, 1977, 1996, 2010, NA), engines = c(4, 1, 2, 2, 2, 1), seats = c(102, 4, 139, 142, 20, 2), speed = c(232, 108, 432, NA, NA, NA))

# Find the mean of each column
map_dbl(planes, mean)
##     year  engines    seats    speed 
##       NA  2.00000 68.16667       NA
# Find the mean of each column, excluding missing values
map_dbl(planes, mean, na.rm = TRUE)
##       year    engines      seats      speed 
## 1982.80000    2.00000   68.16667  257.33333
# Find the 5th percentile of each column, excluding missing values
map_dbl(planes, quantile, probs = 0.05, na.rm = TRUE)
##    year engines   seats   speed 
##  1959.8     1.0     2.5   120.4

Great work! Other nice features of the map functions are that they’re implemented in C, which makes them really fast, and that they generally preserve names from their input in the output.

Picking the right map function

Exercise

Choosing the right map function is important. You can always use map(), which will return a list. However, if you know what type of output you expect, you are better to use the corresponding function. That way, if you expect one thing and get another, you’ll know immediately because the map function will return an error.

For example, try running:

map_lgl(df, mean)

The map functions are what we call type consistent. This means you know exactly what type of output to expect regardless of the input. map_lgl() either returns either a logical vector or an error. map_dbl() returns either a double or an error.

One way to check the output type is to run the corresponding function on the first element. For example, mean(df[[1]]) returns a single numeric value, suggesting map_dbl().

Let’s get some more practice with the map functions. We’ve created a new data frame, df3, in your workspace. Use a map function to find which columns are numeric, the type of each column, and a summary of each column.

Instructions

Remember to choose the appropriate map function based on the output you expect for each of the following:

  • Find which columns are numeric in df3 by combining a map function with is.numeric().
  • Find the type of each column in df3 by combining a map function with typeof().
  • Find a summary of each column in df3 by combining a map function with summary().
df3 <- data.frame(A = c(-2.13641409, -0.54173437, 1.25655044, 0.17926296, -1.86171148, -0.46128828, -0.52251896, 0.06505055, -0.03916377, 0.30235985), B = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B"), C = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), D = c(0.8053080, -0.6494968, -0.1815008, 0.8507846, 0.2905451, 0.7030261, 0.6552918, -1.3814747, -0.6142817, -2.1614109))

# Find the columns that are numeric
map_lgl(df3, is.numeric)
##     A     B     C     D 
##  TRUE FALSE  TRUE  TRUE
# Find the type of each column
map_chr(df3, typeof)
##         A         B         C         D 
##  "double" "integer"  "double"  "double"
# Find a summary of each column
map(df3, summary)
## $A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.1364 -0.5369 -0.2502 -0.3760  0.1507  1.2566 
## 
## $B
## A B 
## 5 5 
## 
## $C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00 
## 
## $D
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.16141 -0.64069  0.05452 -0.16832  0.69109  0.85078

Excellent work! In chapter 5 we’ll learn more about why type consistent functions are better to program with, and why you should avoid type inconsistent functions like sapply().

Video: Shortcuts

Solve a simple problem first

Exercise

Our goal is to fit a separate linear regression of miles per gallon (mpg) against weight (wt) for each group of cars in our list of data frames, where each data frame in our list represents a different group. How should we get started?

First, let’s confirm the structure of this list of data frames. Then, we’ll solve a simpler problem first: fit the regression to the first group of cars.

Instructions

  • Examine the structure of cyl with str().
  • Extract the first data frame in cyl and assign it to the variable four_cyls.
  • Fit a linear regression of miles per gallon on weight using four_cyls as the data argument to lm().
cyl <- list(`4` = mtcars[which(mtcars$cyl == 4), ], `6` = mtcars[which(mtcars$cyl == 6), ], `8` = mtcars[which(mtcars$cyl == 8), ])

# Examine the structure of cyl
str(cyl)
## List of 3
##  $ 4:'data.frame':   11 obs. of  11 variables:
##   ..$ mpg : num [1:11] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
##   ..$ cyl : num [1:11] 4 4 4 4 4 4 4 4 4 4 ...
##   ..$ disp: num [1:11] 108 146.7 140.8 78.7 75.7 ...
##   ..$ hp  : num [1:11] 93 62 95 66 52 65 97 66 91 113 ...
##   ..$ drat: num [1:11] 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
##   ..$ wt  : num [1:11] 2.32 3.19 3.15 2.2 1.61 ...
##   ..$ qsec: num [1:11] 18.6 20 22.9 19.5 18.5 ...
##   ..$ vs  : num [1:11] 1 1 1 1 1 1 1 1 0 1 ...
##   ..$ am  : num [1:11] 1 0 0 1 1 1 0 1 1 1 ...
##   ..$ gear: num [1:11] 4 4 4 4 4 4 3 4 5 5 ...
##   ..$ carb: num [1:11] 1 2 2 1 2 1 1 1 2 2 ...
##  $ 6:'data.frame':   7 obs. of  11 variables:
##   ..$ mpg : num [1:7] 21 21 21.4 18.1 19.2 17.8 19.7
##   ..$ cyl : num [1:7] 6 6 6 6 6 6 6
##   ..$ disp: num [1:7] 160 160 258 225 168 ...
##   ..$ hp  : num [1:7] 110 110 110 105 123 123 175
##   ..$ drat: num [1:7] 3.9 3.9 3.08 2.76 3.92 3.92 3.62
##   ..$ wt  : num [1:7] 2.62 2.88 3.21 3.46 3.44 ...
##   ..$ qsec: num [1:7] 16.5 17 19.4 20.2 18.3 ...
##   ..$ vs  : num [1:7] 0 0 1 1 1 1 0
##   ..$ am  : num [1:7] 1 1 0 0 0 0 1
##   ..$ gear: num [1:7] 4 4 3 3 4 4 5
##   ..$ carb: num [1:7] 4 4 1 1 4 4 6
##  $ 8:'data.frame':   14 obs. of  11 variables:
##   ..$ mpg : num [1:14] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 ...
##   ..$ cyl : num [1:14] 8 8 8 8 8 8 8 8 8 8 ...
##   ..$ disp: num [1:14] 360 360 276 276 276 ...
##   ..$ hp  : num [1:14] 175 245 180 180 180 205 215 230 150 150 ...
##   ..$ drat: num [1:14] 3.15 3.21 3.07 3.07 3.07 2.93 3 3.23 2.76 3.15 ...
##   ..$ wt  : num [1:14] 3.44 3.57 4.07 3.73 3.78 ...
##   ..$ qsec: num [1:14] 17 15.8 17.4 17.6 18 ...
##   ..$ vs  : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ am  : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ gear: num [1:14] 3 3 3 3 3 3 3 3 3 3 ...
##   ..$ carb: num [1:14] 2 4 3 3 3 4 4 4 2 2 ...
# Extract the first element into four_cyls
four_cyls <- cyl[[1]]

# Fit a linear regression of mpg on wt using four_cyls
lm(mpg ~ wt, data = four_cyls)
## 
## Call:
## lm(formula = mpg ~ wt, data = four_cyls)
## 
## Coefficients:
## (Intercept)           wt  
##      39.571       -5.647

Using an anonymous function

Exercise

Great! We now have a snippet of code that performs the operation we want on one data frame. One option would be to turn this into a function, for example:

fit_reg <- function(df) {
  lm(mpg ~ wt, data = df)
}

Then pass this function into map():

map(cyl, fit_reg)

But it seems a bit much to define a function for such a specific model when we only want to do this once. Instead of defining the function in the global environment, we will just use the function anonymously inside our call to map().

What does this mean? Instead of referring to our function by name in map(), we define it on the fly in the .f argument to map().

Instructions

Rewrite the map() call to use the anonymous function function(df) lm(mpg ~ wt, data = df).

# Rewrite to call an anonymous function
map(cyl, function(df) lm(mpg ~ wt, data = df))
## $`4`
## 
## Call:
## lm(formula = mpg ~ wt, data = df)
## 
## Coefficients:
## (Intercept)           wt  
##      39.571       -5.647  
## 
## 
## $`6`
## 
## Call:
## lm(formula = mpg ~ wt, data = df)
## 
## Coefficients:
## (Intercept)           wt  
##       28.41        -2.78  
## 
## 
## $`8`
## 
## Call:
## lm(formula = mpg ~ wt, data = df)
## 
## Coefficients:
## (Intercept)           wt  
##      23.868       -2.192

Using a formula

Exercise

Writing anonymous functions takes a lot of extra key strokes, so purrr provides a shortcut that allows you to write an anonymous function as a one-sided formula instead.

In R, a one-sided formula starts with a ~, followed by an R expression. In purrr’s map functions, the R expression can refer to an element of the .x argument using the . character.

Let’s take a look at an example. Imagine, instead of a regression on each data frame in cyl, we wanted to know the mean displacement for each data frame. One way to do this would be to use an anonymous function:

map_dbl(cyl, function(df) mean(df$disp))

To perform the same operation using the formula shortcut, we replace the function definition (function(df)) with the ~, then when we need to refer to the element of cyl the function operates on (in this case df), we use a ..

map_dbl(cyl, ~ mean(.$disp))

See how much less typing it involves! It also saves you from coming up with an argument name. Can you rewrite our previous anonymous function using this formula shortcut instead?

Instructions

Rewrite our call to map() to use the formula notation instead of an anonymous function.

# Rewrite to use the formula shortcut instead
map(cyl, ~ lm(mpg ~ wt, data = .))
## $`4`
## 
## Call:
## lm(formula = mpg ~ wt, data = .)
## 
## Coefficients:
## (Intercept)           wt  
##      39.571       -5.647  
## 
## 
## $`6`
## 
## Call:
## lm(formula = mpg ~ wt, data = .)
## 
## Coefficients:
## (Intercept)           wt  
##       28.41        -2.78  
## 
## 
## $`8`
## 
## Call:
## lm(formula = mpg ~ wt, data = .)
## 
## Coefficients:
## (Intercept)           wt  
##      23.868       -2.192

Using a string

Exercise

There are also some useful shortcuts that come in handy when you want to subset each element of the .x argument. If the .f argument to a map function is set equal to a string, let’s say "name", then purrr extracts the "name" element from every element of .x.

This is a really common situation you find yourself in when you work with nested lists. For example, if we have a list of where every element contains an a and b element:

list_of_results <- list(
  list(a = 1, b = "A"), 
  list(a = 2, b = "C"), 
  list(a = 3, b = "D")
)

We might want to pull out the a element from every entry. We could do it with the string shortcut like this:

map(list_of_results, "a")

Now take our list of regresssion models:

map(cyl, ~ lm(mpg ~ wt, data = .))

It might be nice to extract the slope coefficient from each model. You’ll do this in a few steps: first fit the models, then get the coefficients from each model using the coef() function, then pull out the wt estimate using the string shortcut.

Instructions

  • Assign the result from the previous exercise to the variable models.
  • Use map and the coef() function to extract the coefficients from each model, and save it in the variable coefs.
  • Use map and the string shortcut to extract the slope wt element from coefficients vectors.
# Save the result from the previous exercise to the variable models
models <- map(cyl, ~ lm(mpg ~ wt, data = .))

# Use map and coef to get the coefficients for each model: coefs
coefs <- map(models, coef)

# Use string shortcut to extract the wt coefficient 
map(coefs, "wt")
## $`4`
## [1] -5.647025
## 
## $`6`
## [1] -2.780106
## 
## $`8`
## [1] -2.192438

Using a numeric vector

Exercise

Another useful shortcut for subsetting is to pass a numeric vector as the .f argument. This works just like passing a string but subsets by index rather than name. For example, with your previous list_of_results:

list_of_results <- list(
  list(a = 1, b = "A"), 
  list(a = 2, b = "C"), 
  list(a = 3, b = "D")
)

Another way to pull out the a element from each list, is to pull out the first element:

map(list_of_results, 1)

Let’s pull out the slopes from our models again, but this time using numeric subsetting. Also, since we are pulling out a single numeric value from each element, let’s use map_dbl().

Instructions

Extract the second element from each vector in coefs using the numeric shortcut and map_dbl().

coefs <- map(models, coef)

# use map_dbl with the numeric shortcut to pull out the second element
map_dbl(coefs, 2)
##         4         6         8 
## -5.647025 -2.780106 -2.192438

Putting it together with pipes

Exercise

purrr also includes a pipe operator: %>%. The pipe operator is another shortcut that saves typing, but also increases readability. The explanation of the pipe operator is quite simple: x %>% f(y) is another way of writing f(x, y). That is, the left hand side of the pipe, x, becomes the first argument to the function, f(), on the right hand side of the pipe.

Take a look at our code to get our list of models:

cyl <- split(mtcars, mtcars$cyl) 
map(cyl, ~ lm(mpg ~ wt, data = .))

We split the data frame mtcars and save it as the variable cyl. We then pass cyl as the first argument to map to fit the models. We could rewrite this using the pipe operator as:

split(mtcars, mtcars$cyl) %>% 
  map(~ lm(mpg ~ wt, data = .))

We read this as “split the data frame mtcars on cyl, then use map() on the result.”

One of the powerful things about the pipe is we can chain together many operations. Here is our complete code, written with pipes, instead assigning each step to a variable and using it in the next step:

mtcars %>% 
  split(mtcars$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(coef) %>% 
  map_dbl("wt")

We’ve written some code in the editor to pull out the R2 from each model. Rewrite the last two lines to use a pipe instead.

Instructions

Rewrite the last two lines to use a pipe instead.

# Define models (don't change)
models <- mtcars %>% 
  split(mtcars$cyl) %>%
  map(~ lm(mpg ~ wt, data = .))

# Rewrite to be a single command using pipes 
summaries <- models %>%
  map(summary) %>%
  map_dbl("r.squared")

Fantastic! We’ll use the pipe operator a lot in the remaining chapters, so you’ll get lots of practice using it.

4. Advanced inputs and outputs

Now you’ve seen how useful the map functions are for reducing duplication, we’ll introduce you to a few more functions in purrr that allow you to handle more complicated inputs and outputs. In particular, you’ll learn how to deal with functions that might return an error, how to iterate over multiple arguments and how to iterate over functions that have no output at all.

Video: Dealing with failure

Creating a safe function

Exercise

safely() is an adverb; it takes a verb and modifies it. That is, it takes a function as an argument and it returns a function as its output. The function that is returned is modified so it never throws an error (and never stops the rest of your computation!).

Instead, it always returns a list with two elements:

  1. result is the original result. If there was an error, this will be NULL.
  2. error is an error object. If the operation was successful this will be NULL.

Let’s try to make the readLines() function safe.

Instructions

  • Pass the readLines() function to safely(), and assign the output to safe_readLines.
  • Use safe_readLines() on the string "http://example.org" to read the example homepage HTML file.
  • Use safe_readLines() on "http://asdfasdasdkfjlda", a nonsense web address that shouldn’t be found.
# Create safe_readLines() by passing readLines() to safely()
safe_readLines <- safely(readLines)

# Call safe_readLines() on "http://example.org"
safe_readLines("http://example.org")
## $result
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"                                                                                              
## 
## $error
## NULL
# Call safe_readLines() on "http://asdfasdasdkfjlda"
safe_readLines("http://asdfasdasdkfjlda")
## Warning in file(con, "r"): InternetOpenUrl failed: 'The server name or
## address could not be resolved'
## $result
## NULL
## 
## $error
## <simpleError in file(con, "r"): cannot open the connection>

Using map safely

Exercise

One feature of safely() is that it plays nicely with the map() functions. Consider this list containing the two URLs from the last exercise, plus one additional URL to make things more interesting:

urls <- list(
  example = "http://example.org",
  rproj = "http://www.r-project.org",
  asdf = "http://asdfasdasdkfjlda"
)

We are interested in quickly downloading the HTML files at each URL. You might try:

map(urls, readLines)

But it results in an error, Error in file(con, "r") : cannot open the connection, and no output for any of the URLs. Go on, try it!

We can solve this problem by using our safe_readLines() instead.

Instructions

  • Use map() on urls with the safe_readLines() function instead and assign the results to html.
  • Call str() on html to examine the output.
  • Extract the result from one of the two elements that was successful using double square bracket subsetting.
  • Extract the error from the element that was unsuccessful, again using double square bracket subsetting.
# Define safe_readLines()
safe_readLines <- safely(readLines)

# Use the safe_readLines() function with map(): html
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): InternetOpenUrl failed: 'The server name or
## address could not be resolved'
# Call str() on html
str(html)
## List of 3
##  $ example:List of 2
##   ..$ result: chr [1:50] "<!doctype html>" "<html>" "<head>" "    <title>Example Domain</title>" ...
##   ..$ error : NULL
##  $ rproj  :List of 2
##   ..$ result: chr [1:119] "<!DOCTYPE html>" "<html lang=\"en\">" "  <head>" "    <meta charset=\"utf-8\">" ...
##   ..$ error : NULL
##  $ asdf   :List of 2
##   ..$ result: NULL
##   ..$ error :List of 2
##   .. ..$ message: chr "cannot open the connection"
##   .. ..$ call   : language file(con, "r")
##   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the result from one of the successful elements
html[[1]][["result"]]
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"
# Extract the error from the element that was unsuccessful
html[[3]][["error"]]
## <simpleError in file(con, "r"): cannot open the connection>

Working with safe output

Exercise

We now have output that contains the HTML for each of the two URLs on which readLines() was successful and the error for the other. But the output isn’t that easy to work with, since the results and errors are buried in the inner-most level of the list.

purrr provides a function transpose() that reshapes a list so the inner-most level becomes the outer-most level. In otherwords, it turns a list-of-lists “inside-out”. Consider the following list:

nested_list <- list(
   x1 = list(a = 1, b = 2),
   x2 = list(a = 3, b = 4)
)

If I need to extract the a element in x1, I could do nested_list[["x1"]][["a"]]. However, if I transpose the list first, the order of subsetting reverses. That is, to extract the same element I could also do transpose(nested_list)[["a"]][["x1"]].

This is really handy for safe output, since we can grab all the results or all the errors really easily.

Instructions

  • Examine the structure of transpose(html).
  • Pull out all the results by subsetting transpose(html) and assign to the variable res.
  • Pull out all the errors by subsetting transpose(html) and assign to the variable errs.
# Define safe_readLines() and html
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): InternetOpenUrl failed: 'The server name or
## address could not be resolved'
# Examine the structure of transpose(html)
str(transpose(html))
## List of 2
##  $ result:List of 3
##   ..$ example: chr [1:50] "<!doctype html>" "<html>" "<head>" "    <title>Example Domain</title>" ...
##   ..$ rproj  : chr [1:119] "<!DOCTYPE html>" "<html lang=\"en\">" "  <head>" "    <meta charset=\"utf-8\">" ...
##   ..$ asdf   : NULL
##  $ error :List of 3
##   ..$ example: NULL
##   ..$ rproj  : NULL
##   ..$ asdf   :List of 2
##   .. ..$ message: chr "cannot open the connection"
##   .. ..$ call   : language file(con, "r")
##   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the results: res
res <- transpose(html)[["result"]]

# Extract the errors: errs
errs <- transpose(html)[["error"]]

Working with errors and results

Exercise

What you do with the errors and results is up to you. But, commonly you’ll want to collect all the results for the elements that were successful and examine the inputs for all those that weren’t.

Instructions

  • Combine map_lgl() with is_null() to create a logical vector, is_ok, that is TRUE when errs is NULL.
  • Extract the successful results by subsetting res with is_ok.
  • Find the unsuccessful URLs by subsetting urls with !is_ok.
# Initialize some objects
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): InternetOpenUrl failed: 'The server name or
## address could not be resolved'
res <- transpose(html)[["result"]]
errs <- transpose(html)[["error"]]

# Create a logical vector is_ok
is_ok <- map_lgl(errs, is_null)

# Extract the successful results
res[is_ok]
## $example
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"                                                                                              
## 
## $rproj
##   [1] "<!DOCTYPE html>"                                                                                                                                                                                                                                                                                                                                     
##   [2] "<html lang=\"en\">"                                                                                                                                                                                                                                                                                                                                  
##   [3] "  <head>"                                                                                                                                                                                                                                                                                                                                            
##   [4] "    <meta charset=\"utf-8\">"                                                                                                                                                                                                                                                                                                                        
##   [5] "    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">"                                                                                                                                                                                                                                                                                       
##   [6] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"                                                                                                                                                                                                                                                                        
##   [7] "    <title>R: The R Project for Statistical Computing</title>"                                                                                                                                                                                                                                                                                       
##   [8] ""                                                                                                                                                                                                                                                                                                                                                    
##   [9] "    <link rel=\"icon\" type=\"image/png\" href=\"/favicon-32x32.png\" sizes=\"32x32\" />"                                                                                                                                                                                                                                                            
##  [10] "    <link rel=\"icon\" type=\"image/png\" href=\"/favicon-16x16.png\" sizes=\"16x16\" />"                                                                                                                                                                                                                                                            
##  [11] ""                                                                                                                                                                                                                                                                                                                                                    
##  [12] "    <!-- Bootstrap -->"                                                                                                                                                                                                                                                                                                                              
##  [13] "    <link href=\"/css/bootstrap.min.css\" rel=\"stylesheet\">"                                                                                                                                                                                                                                                                                       
##  [14] "    <link href=\"/css/R.css\" rel=\"stylesheet\">"                                                                                                                                                                                                                                                                                                   
##  [15] ""                                                                                                                                                                                                                                                                                                                                                    
##  [16] "    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->"                                                                                                                                                                                                                                                          
##  [17] "    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->"                                                                                                                                                                                                                                                                      
##  [18] "    <!--[if lt IE 9]>"                                                                                                                                                                                                                                                                                                                               
##  [19] "      <script src=\"https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js\"></script>"                                                                                                                                                                                                                                                             
##  [20] "      <script src=\"https://oss.maxcdn.com/respond/1.4.2/respond.min.js\"></script>"                                                                                                                                                                                                                                                                 
##  [21] "    <![endif]-->"                                                                                                                                                                                                                                                                                                                                    
##  [22] "  </head>"                                                                                                                                                                                                                                                                                                                                           
##  [23] "  <body>"                                                                                                                                                                                                                                                                                                                                            
##  [24] "    <div class=\"container page\">"                                                                                                                                                                                                                                                                                                                  
##  [25] "      <div class=\"row\">"                                                                                                                                                                                                                                                                                                                           
##  [26] "        <div class=\"col-xs-12 col-sm-offset-1 col-sm-2 sidebar\" role=\"navigation\">"                                                                                                                                                                                                                                                              
##  [27] "<div class=\"row\">"                                                                                                                                                                                                                                                                                                                                 
##  [28] "<div class=\"col-xs-6 col-sm-12\">"                                                                                                                                                                                                                                                                                                                  
##  [29] "<p><a href=\"/\"><img src=\"/Rlogo.png\" width=\"100\" height=\"78\" alt = \"R\" /></a></p>"                                                                                                                                                                                                                                                         
##  [30] "<p><small><a href=\"/\">[Home]</a></small></p>"                                                                                                                                                                                                                                                                                                      
##  [31] "<h2 id=\"download\">Download</h2>"                                                                                                                                                                                                                                                                                                                   
##  [32] "<p><a href=\"http://cran.r-project.org/mirrors.html\">CRAN</a></p>"                                                                                                                                                                                                                                                                                  
##  [33] "<h2 id=\"r-project\">R Project</h2>"                                                                                                                                                                                                                                                                                                                 
##  [34] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [35] "<li><a href=\"/about.html\">About R</a></li>"                                                                                                                                                                                                                                                                                                        
##  [36] "<li><a href=\"/logo/\">Logo</a></li>"                                                                                                                                                                                                                                                                                                                
##  [37] "<li><a href=\"/contributors.html\">Contributors</a></li>"                                                                                                                                                                                                                                                                                            
##  [38] "<li><a href=\"/news.html\">What’s New?</a></li>"                                                                                                                                                                                                                                                                                                   
##  [39] "<li><a href=\"/bugs.html\">Reporting Bugs</a></li>"                                                                                                                                                                                                                                                                                                  
##  [40] "<li><a href=\"http://developer.R-project.org\">Development Site</a></li>"                                                                                                                                                                                                                                                                            
##  [41] "<li><a href=\"/conferences.html\">Conferences</a></li>"                                                                                                                                                                                                                                                                                              
##  [42] "<li><a href=\"/search.html\">Search</a></li>"                                                                                                                                                                                                                                                                                                        
##  [43] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [44] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [45] "<div class=\"col-xs-6 col-sm-12\">"                                                                                                                                                                                                                                                                                                                  
##  [46] "<h2 id=\"r-foundation\">R Foundation</h2>"                                                                                                                                                                                                                                                                                                           
##  [47] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [48] "<li><a href=\"/foundation/\">Foundation</a></li>"                                                                                                                                                                                                                                                                                                    
##  [49] "<li><a href=\"/foundation/board.html\">Board</a></li>"                                                                                                                                                                                                                                                                                               
##  [50] "<li><a href=\"/foundation/members.html\">Members</a></li>"                                                                                                                                                                                                                                                                                           
##  [51] "<li><a href=\"/foundation/donors.html\">Donors</a></li>"                                                                                                                                                                                                                                                                                             
##  [52] "<li><a href=\"/foundation/donations.html\">Donate</a></li>"                                                                                                                                                                                                                                                                                          
##  [53] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [54] "<h2 id=\"help-with-r\">Help With R</h2>"                                                                                                                                                                                                                                                                                                             
##  [55] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [56] "<li><a href=\"/help.html\">Getting Help</a></li>"                                                                                                                                                                                                                                                                                                    
##  [57] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [58] "<h2 id=\"documentation\">Documentation</h2>"                                                                                                                                                                                                                                                                                                         
##  [59] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [60] "<li><a href=\"http://cran.r-project.org/manuals.html\">Manuals</a></li>"                                                                                                                                                                                                                                                                             
##  [61] "<li><a href=\"http://cran.r-project.org/faqs.html\">FAQs</a></li>"                                                                                                                                                                                                                                                                                   
##  [62] "<li><a href=\"http://journal.r-project.org\">The R Journal</a></li>"                                                                                                                                                                                                                                                                                 
##  [63] "<li><a href=\"/doc/bib/R-books.html\">Books</a></li>"                                                                                                                                                                                                                                                                                                
##  [64] "<li><a href=\"/certification.html\">Certification</a></li>"                                                                                                                                                                                                                                                                                          
##  [65] "<li><a href=\"/other-docs.html\">Other</a></li>"                                                                                                                                                                                                                                                                                                     
##  [66] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [67] "<h2 id=\"links\">Links</h2>"                                                                                                                                                                                                                                                                                                                         
##  [68] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [69] "<li><a href=\"http://www.bioconductor.org\">Bioconductor</a></li>"                                                                                                                                                                                                                                                                                   
##  [70] "<li><a href=\"/other-projects.html\">Related Projects</a></li>"                                                                                                                                                                                                                                                                                      
##  [71] "<li><a href=\"/gsoc.html\">GSoC</a></li>"                                                                                                                                                                                                                                                                                                            
##  [72] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [73] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [74] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [75] "        </div>"                                                                                                                                                                                                                                                                                                                                      
##  [76] "        <div class=\"col-xs-12 col-sm-7\">"                                                                                                                                                                                                                                                                                                          
##  [77] "        <h1>The R Project for Statistical Computing</h1>"                                                                                                                                                                                                                                                                                            
##  [78] "<h2 id=\"getting-started\">Getting Started</h2>"                                                                                                                                                                                                                                                                                                     
##  [79] "<p>R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To <strong><a href=\"http://cran.r-project.org/mirrors.html\">download R</a></strong>, please choose your preferred <a href=\"http://cran.r-project.org/mirrors.html\">CRAN mirror</a>.</p>"
##  [80] "<p>If you have questions about R like how to download and install the software, or what the license terms are, please read our <a href=\"http://cran.R-project.org/faqs.html\">answers to frequently asked questions</a> before you send an email.</p>"                                                                                              
##  [81] "<h2 id=\"news\">News</h2>"                                                                                                                                                                                                                                                                                                                           
##  [82] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [83] "<li><p>You can now support the R Foundation with a renewable subscription as a <a href=\"https://www.r-project.org/foundation/donations.html\">supporting member</a></p></li>"                                                                                                                                                                       
##  [84] "<li><p><a href=\"https://cran.r-project.org/src/base/R-3\"><strong>R version 3.5.1 (Feather Spray)</strong></a> has been released on 2018-07-02.</p></li>"                                                                                                                                                                                           
##  [85] "<li><p>The R Foundation has been awarded the Personality/Organization of the year 2018 award by the professional association of German market and social researchers.</p></li>"                                                                                                                                                                      
##  [86] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [87] "<h2 id=\"news-via-twitter\">News via Twitter</h2>"                                                                                                                                                                                                                                                                                                   
##  [88] "<a class=\"twitter-timeline\""                                                                                                                                                                                                                                                                                                                       
##  [89] " href=\"https://twitter.com/_R_Foundation?ref_src=twsrc%5Etfw\""                                                                                                                                                                                                                                                                                     
##  [90] " data-width=\"400\""                                                                                                                                                                                                                                                                                                                                 
##  [91] " data-show-replies=\"false\""                                                                                                                                                                                                                                                                                                                        
##  [92] " data-chrome=\"noheader,nofooter,noborders\""                                                                                                                                                                                                                                                                                                        
##  [93] " data-dnt=\"true\""                                                                                                                                                                                                                                                                                                                                  
##  [94] " data-tweet-limit=\"3\">News from the R Foundation</a>"                                                                                                                                                                                                                                                                                              
##  [95] "<script async"                                                                                                                                                                                                                                                                                                                                       
##  [96] " src=\"https://platform.twitter.com/widgets.js\""                                                                                                                                                                                                                                                                                                    
##  [97] " charset=\"utf-8\"></script>"                                                                                                                                                                                                                                                                                                                        
##  [98] "<!--- (Boilerplate for release run-in)"                                                                                                                                                                                                                                                                                                              
##  [99] "-   [**R version 3.1.3 (Smooth Sidewalk) prerelease versions**](http://cran.r-project.org/src/base-prerelease) will appear starting February 28. Final release is scheduled for 2015-03-09."                                                                                                                                                         
## [100] "-->"                                                                                                                                                                                                                                                                                                                                                 
## [101] "        </div>"                                                                                                                                                                                                                                                                                                                                      
## [102] "      </div>"                                                                                                                                                                                                                                                                                                                                        
## [103] "      <div class=\"raw footer\">"                                                                                                                                                                                                                                                                                                                    
## [104] "        &copy; The R Foundation. For queries about this web site, please contact"                                                                                                                                                                                                                                                                    
## [105] "\t<script type='text/javascript'>"                                                                                                                                                                                                                                                                                                                    
## [106] "<!--"                                                                                                                                                                                                                                                                                                                                                
## [107] "var s=\"=b!isfg>#nbjmup;xfcnbtufsAs.qspkfdu/psh#?uif!xfcnbtufs=0b?\";"                                                                                                                                                                                                                                                                               
## [108] "m=\"\"; for (i=0; i<s.length; i++) {if(s.charCodeAt(i) == 28){m+= '&';} else if (s.charCodeAt(i) == 23) {m+= '!';} else {m+=String.fromCharCode(s.charCodeAt(i)-1);}}document.write(m);//-->"                                                                                                                                                        
## [109] "\t</script>;"                                                                                                                                                                                                                                                                                                                                         
## [110] "        for queries about R itself, please consult the "                                                                                                                                                                                                                                                                                             
## [111] "        <a href=\"help.html\">Getting Help</a> section."                                                                                                                                                                                                                                                                                             
## [112] "      </div>"                                                                                                                                                                                                                                                                                                                                        
## [113] "    </div>"                                                                                                                                                                                                                                                                                                                                          
## [114] "    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->"                                                                                                                                                                                                                                                                                  
## [115] "    <script src=\"https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js\"></script>"                                                                                                                                                                                                                                                     
## [116] "    <!-- Include all compiled plugins (below), or include individual files as needed -->"                                                                                                                                                                                                                                                            
## [117] "    <script src=\"/js/bootstrap.min.js\"></script>"                                                                                                                                                                                                                                                                                                  
## [118] "  </body>"                                                                                                                                                                                                                                                                                                                                           
## [119] "</html>"
# Find the URLs that were unsuccessful
urls[!is_ok]
## $asdf
## [1] "http://asdfasdasdkfjlda"

Video: Maps over multiple arguments

Getting started

Exercise

We’ll use random number generation as an example throughout the remaining exercises in this chapter. To get started, let’s imagine simulating 5 random numbers from a Normal distribution. You can do this in R with the rnorm() function. For example, to generate 5 random numbers from a Normal distribution with mean zero, we can do:

rnorm(n = 5)

Now, imagine you want to do this three times, but each time with a different sample size. You already know how! Let’s use the map() function to get it done.

Instructions

  • Create a list n containing the values: 5, 10, and 20.
  • Use map() to iterate over n, each time applying the function rnorm().
# Create a list n containing the values: 5, 10, and 20
n = list(5, 10, 20)

# Call map() on n with rnorm() to simulate three samples
map(n, rnorm)
## [[1]]
## [1] -1.1469904  2.4975007 -1.9272133  0.7783345  0.9114511
## 
## [[2]]
##  [1] -0.2337036  1.8748620 -1.5758529 -2.3189536 -0.7428074 -1.8256433
##  [7] -0.4730371  0.4290537  1.5295782 -1.3502995
## 
## [[3]]
##  [1] -0.99018610  1.04436651 -0.30661648 -0.34647854 -0.80708796
##  [6] -0.46017581  0.59775163  2.06227455  0.20230164  1.03245125
## [11] -0.24261787  0.70598204 -0.42377479  1.34759524  0.44965158
## [16]  0.79605774  0.01229766 -1.32845745  0.15156184  0.81738876

Mapping over two arguments

Exercise

Ok, but now imagine we don’t just want to vary the sample size, we also want to vary the mean. The mean can be specified in rnorm() by the argument mean. Now there are two arguments to rnorm() we want to vary: n and mean.

The map2() function is designed exactly for this purpose; it allows iteration over two objects. The first two arguments to map2() are the objects to iterate over and the third argument .f is the function to apply.

Let’s use map2() to simulate three samples with different sample sizes and different means.

Instructions

  • Create a list mu containing the values: 1, 5, and 10.
  • Edit the map() call to use map2() with both n and mu.
# Initialize n
n <- list(5, 10, 20)

# Create a list mu containing the values: 1, 5, and 10
mu = list(1, 5, 10)

# Edit to call map2() on n and mu with rnorm() to simulate three samples
map2(n, mu, rnorm)
## [[1]]
## [1]  0.9758931 -0.1812981  1.5555424  1.9331116  1.0600187
## 
## [[2]]
##  [1] 3.727427 4.307103 5.489234 4.603160 3.738368 5.066826 4.053531
##  [8] 6.393750 3.698077 4.475885
## 
## [[3]]
##  [1]  8.762064  8.326619  9.532079  9.957813 10.210206  8.198173  9.687371
##  [8] 10.155119  8.206832  8.471932  9.117131  9.935629 10.920511 11.419466
## [15] 11.655665  9.686627  9.728569  9.697320  8.981511 10.536834

Mapping over more than two arguments

Exercise

But wait, there’s another argument to rnorm() we might want to vary: sd, the standard deviation of the Normal distribution. You might think there is a map3() function, but there isn’t. Instead purrr provides a pmap() function that iterates over 2 or more arguments.

First, let’s take a look at pmap() for the situation we just solved: iterating over two arguments. Instead of providing each otem to iterate as arguments, pmap() takes a list of arguments as its input. For examle, we could replicate our previous example, iterating over both n and mu with the following:

n <- list(5, 10, 20)
mu <- list(1, 5, 10)

pmap(list(n, mu), rnorm)

Notice how we had to put our two items to iterate over (n and mu) into a list.

Let’s expand this code to iterate over varying standard deviations too.

Instructions

  • Create a list sd with the values: 0.1, 1 and 0.1.
  • Edit your call to pmap() to also iterate over sd.
# Initialize n and mu
n <- list(5, 10, 20)
mu <- list(1, 5, 10)

# Create a sd list with the values: 0.1, 1 and 0.1
sd <- list(0.1, 1, 0.1)

# Edit this call to pmap() to iterate over the sd list as well
pmap(list(n, mu, sd), rnorm)
## [[1]]
## [1] 1.0528919 1.1648386 0.8886547 1.0393355 1.0712239
## 
## [[2]]
##  [1] 4.152407 4.597224 6.697670 5.089153 4.062108 4.383448 6.371211
##  [8] 5.521881 5.859346 5.762934
## 
## [[3]]
##  [1] 10.029109  9.943092  9.886960  9.957687  9.954770  9.777668  9.965139
##  [8] 10.055390  9.972113  9.948000 10.044777 10.081238 10.066976  9.972944
## [15] 10.038387  9.996991  9.980090  9.874104 10.084952  9.891329

Argument matching

Exercise

Compare the following two calls to pmap() (run them in the console and compare their output too!):

pmap(list(n, mu, sd), rnorm)
pmap(list(mu, n, sd), rnorm)

What’s the difference? By default pmap() matches the elements of the list to the arguments in the function by position. In the first case, n to the n argument of rnorm(), mu to the mean argument of rnorm(), and sd to the sd argument of rnorm(). In the second case mu gets matched to the n argument of rnorm(), which is clearly not what we intended!

Instead of relying on this positional matching, a safer alternative is to provide names in our list. The name of each element should be the argument name we want to match it to.

Let’s fix up that second call.

Instructions

Without changing their ordering, simply name the elements of the argument list to properly match the arguments of rnorm().

# Name the elements of the argument list
pmap(list(mean = mu, n = n, sd = sd), rnorm)
## [[1]]
## [1] 1.1519608 1.0735480 1.0280553 0.9261175 0.9922490
## 
## [[2]]
##  [1] 4.131899 4.949653 5.228590 4.832954 3.812897 5.707803 3.509142
##  [8] 2.491301 6.604584 5.137850
## 
## [[3]]
##  [1]  9.964222  9.961345 10.080620  9.974861  9.760178 10.160124  9.860835
##  [8]  9.882635 10.047251  9.820728  9.937227 10.065364  9.993196 10.101329
## [15]  9.978600  9.985901 10.003213  9.904783  9.959685  9.957789

Mapping over functions and their arguments

Exercise

Sometimes it’s not the arguments to a function you want to iterate over, but a set of functions themselves. Imagine that instead of varying the parameters to rnorm() we want to simulate from different distributions, say, using rnorm(), runif(), and rexp(). How do we iterate over calling these functions?

In purrr, this is handled by the invoke_map() function. The first argument is a list of functions. In our example, something like:

f <- list("rnorm", "runif", "rexp")

The second argument specifies the arguments to the functions. In the simplest case, all the functions take the same argument, and we can specify it directly, relying on ... to pass it to each function. In this case, call each function with the argument n = 5:

invoke_map(f, n = 5)

In more complicated cases, the functions may take different arguments, or we may want to pass different values to each function. In this case, we need to supply invoke_map() with a list, where each element specifies the arguments to the corresponding function.

Let’s use this approach to simulate three samples from the following three distributions: Normal(10, 1), Uniform(0, 5), and Exponential(5).

Instructions

We’ve given you some code to get you started.

  • Add min and max elements to runif_params with values 0 and 5 respectively.
  • Add a rate element to rexp_params with value 5.
  • Call invoke_map() of f() using the params list as the second argument, keeping n = 5 as a global argument.
# Define list of functions
f <- list("rnorm", "runif", "rexp")

# Parameter list for rnorm()
rnorm_params <- list(mean = 10)

# Add a min element with value 0 and max element with value 5
runif_params <- list(min = 0, max = 5)

# Add a rate element with value 5
rexp_params <- list(rate = 5)

# Define params for each function
params <- list(
  rnorm_params,
  runif_params,
  rexp_params
)

# Call invoke_map() on f supplying params as the second argument
invoke_map(f, params, n = 5)
## [[1]]
## [1]  9.898867  9.106245 10.191681  9.598026  9.432234
## 
## [[2]]
## [1] 2.3012579 0.7346494 2.6671266 0.3859850 0.9904662
## 
## [[3]]
## [1] 0.53597570 0.45202542 0.07089529 0.09059730 0.09841491

Video: Maps with side effects

Walk

Exercise

walk() operates just like map() except it’s designed for functions that don’t return anything. You use walk() for functions with side effects like printing, plotting or saving.

Let’s check that our simulated samples are in fact what we think they are by plotting a histogram for each one.

Instructions

We’ve included code from our simulation of three samples from three different distributions, except we’ve bumped up the sample size to 50. We’ve also added some names to f and params to help us remember what matches up with what.

  • Assign the simulated samples to sims.
  • Can walk() on sims with the hist() function to create a histogram of each sample.
# Define list of functions
f <- list(Normal = "rnorm", Uniform = "runif", Exp = "rexp")

# Define params
params <- list(
  Normal = list(mean = 10),
  Uniform = list(min = 0, max = 5),
  Exp = list(rate = 5)
)

# Assign the simulated samples to sims
sims <- invoke_map(f, params, n = 50)

# Use walk() to make a histogram of each element in sims
walk(sims, hist)

Walking over two or more arguments

Exercise

Those histograms were pretty good, but they really needed better breaks for the bins on the x-axis. That means we need to vary two arguments to hist(): x and breaks. Remember map2()? That allowed us to iterate over two arguments. Guess what? There is a walk2(), too!

Let’s use walk2() to improve those histograms with better breaks.

Instructions

We’ve loaded sims in your workplace.

  • The default value for the breaks argument to hist() is "Sturges". Replace "Sturges" in the breaks_list list with reasonable breaks for the histograms. Let’s use seq(6, 16, 0.5) for the Normal, seq(0, 5, 0.25) for the Uniform and seq(0, 1.5, 0.1) for the Exponential.
  • Use walk2() to create a histogram for each sample with the breaks in breaks_list.
# Replace "Sturges" with reasonable breaks for each sample
breaks_list <- list(
  Normal = seq(6, 16, 0.5),
  Uniform = seq(0, 5, 0.25),
  Exp = seq(0, 1.5, 0.1)
)

# Use walk2() to make histograms with the right breaks
walk2(sims, breaks_list, hist)

Putting together writing functions and walk

Exercise

In the previous exercise, we hard-coded the breaks, but that was a little lazy. Those breaks probably won’t be great if we change the parameters of our simulation.

A better idea would be to generate reasonable breaks based on the actual values in our simulated samples. This is a great chance to review our function writing skills and combine our own function with purrr.

Let’s start by writing our own function find_breaks(), which copies the default breaks in the ggplot2 package: break the range of the data in 30 bins.

How do we start? Simple, of course! Here’s a snippet of code that works for the first sample:

rng <- range(sims[[1]], na.rm = TRUE)
seq(rng[1], rng[2], length.out = 30)

Your job in this exercise is to turn that snippet into a function.

In the next exercise, we’ll combine find_breaks() with map() and walk2() to create histograms with sensible breaks.

Instructions

  • Turn the snippet above into a function called find_breaks(), which takes a single argument x and return the sequence of breaks.
  • Check that your function works by calling find_breaks() on sims[[1]].
# Turn this snippet into find_breaks()
find_breaks <- function(x) {
  rng <- range(x, na.rm = TRUE)
  seq(rng[1], rng[2], length.out = 30)
}

# Call find_breaks() on sims[[1]]
find_breaks(sims[[1]])
##  [1]  7.701320  7.833121  7.964923  8.096724  8.228525  8.360326  8.492128
##  [8]  8.623929  8.755730  8.887532  9.019333  9.151134  9.282935  9.414737
## [15]  9.546538  9.678339  9.810141  9.941942 10.073743 10.205544 10.337346
## [22] 10.469147 10.600948 10.732750 10.864551 10.996352 11.128154 11.259955
## [29] 11.391756 11.523557

Nice breaks for all

Exercise

Now that we have find_breaks(), we can find nice breaks for all the samples using map(). Then, pass the result into walk2() to get nice (but custom breaks) for our samples.

Instructions

  • Use map() to iterate find_breaks() over sims and assign the result to nice_breaks.
  • Use nice_breaks as the second argument to walk2() to iterate over both the simulations and calculated breaks to plot histograms.
# Use map() to iterate find_breaks() over sims: nice_breaks
nice_breaks <- map(sims, find_breaks)

# Use nice_breaks as the second argument to walk2()
walk2(sims, nice_breaks, hist)

Walking with many arguments: pwalk

Exercise

Ugh! Nice breaks but those plots had UUUUGLY labels and tytles. The x-axis labels are easy to fix if we don’t mind every plot having its x-axis labeled the same way. We can use the ... argument to any of the map() or walk() functions to pass in further arguments to the function .f. In this case, we might decide we don’t want any labels on the x-axis, in which case we need to pass an empty string to the xlab argument of hist():

walk2(sims, nice_breaks, hist, xlab = "")

But, what about the titles? We don’t want them to be the same for each plot. How can we iterate over the arguments x, breaks and main? You guessed it, there is a pwalk() function that works just like pmap().

Let’s use pwalk() to tidy up these plots. Also, let’s increase our sample size to 1000.

Instructions

  • Increase the sample size to 1000.
  • Create a vector nice_titles that contains the character strings: "Normal(10, 1)", Uniform(0, 5)", "Exp(5)".
  • Use pwalk() instead of walk2() to iterate over the x, breaks and main arguments to hist(). Like for pmap(), the first argument to pwalk() should be a list() of arguments to hist() using matching by name. Keep the xlab = "" argument as-is to keep things clean.
# Increase the sample size to 1000
sims <- invoke_map(f, params, n = 1000)

# Compute nice_breaks (don't change this)
nice_breaks <- map(sims, find_breaks)

# Create a vector of nice_titles
nice_titles <- c("Normal(10, 1)", "Uniform(0, 5)", "Exp(5)")

# Use pwalk() instead of walk2()
pwalk(list(sims, nice_breaks, main = nice_titles), hist, xlab = "")

Walking with pipes

Exercise

One of the nice things about the walk() functions is that they return the object you passed to them. This means they can easily be used in pipelins (a pipeline is just a short way of saying “a statement with lots of pipes”).

To illustrate, we’ll return to our first example of making histograms for each sample:

walk(sims, hist)

Take a look at what gets returned:

tmp <- walk(sims, hist)
str(tmp)

It’s our original sims object. That means we can pipe the sims object along to other functions. For example, we might want some basic summary statistics on each sample as well as our histograms.

Instructions

We’ve converted walk(sims, hist) to a piped statement: sims %>% walk(hist). Pipe the result into map using summary() as the .f argument.

# Pipe this along to map(), using summary() as .f
sims %>%
  walk(hist) %>%
  map(summary)

## $Normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.392   9.367  10.061  10.045  10.713  13.328 
## 
## $Uniform
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.002306 1.412243 2.531916 2.558309 3.764220 4.998873 
## 
## $Exp
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0003988 0.0548013 0.1339406 0.1945181 0.2686363 1.7022091

5. Robust functions

In this chapter we’ll focus on writing functions that don’t surprise you or your users. We’ll expose you to some functions that work 95% of the time, and 5% of the time fail in surprising ways. You’ll learn which functions you should avoid using inside a function and which you should use with care.

Video: Robust functions

An error is better than a surprise

Exercise

Recall our both_na() function from Chapter2, that finds the number of entries where vectors x and y both have missing values:

both_na <- function(x, y) {
  sum(is.na(x) & is.na(y))
}

We had an example where the behaviour was a little suprising:

x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)
both_na(x, y)

The function works and returns 3, but we certainly didn’t design this function with the idea that people could pass in different length arguments.

Using stopifnot() is a quick way to have your functions stop, if a condition isn’t met. stopifnot() takes logical expressions as arguments and if any are FALSE an error will occur.

Instructions

  • Add a call to stopifnot() to both_na() that checks arguments x and y have the same length.
  • Run the call to both_na() to verify it returns an error.
# Define troublesome x and y
x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)

both_na <- function(x, y) {
  # Add stopifnot() to check length of x and y
  stopifnot(length(x) == length(y))
  
  sum(is.na(x) & is.na(y))
}

# Call both_na() on x and y
both_na(x, y)
## Error: length(x) == length(y) is not TRUE

An informative error is even better

Exercise

Using stop() instead of stopifnot() allows you to specify a more informative error message. Recall the general pattern for using stop() is:

if (condition) {
  stop("Error", call. = FALSE)
}

Writing good error messages is an important part of writing a good function! We recommend your error tells the user what should betrue, not what is false. For example, here a good error would be "x and y must have the same length", rather than the bad error "x and y don't have the same length".

Let’s use this pattern to write a better check for the length of x and y.

Instructions

  • Replace condition with a logical statement that evaluates to TRUE when x and y have different lengths.
  • Change the error message to "x and y must have the same length".
  • Run the call to both_na() to verigy it returns a more informative error.
# Define troublesome x and y
x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)

both_na <- function(x, y) {
  # Replace condition with logical
  if (length(x) != length(y)) {
    # Replace "Error" with better message
    stop("x and y must have the same length", call. = FALSE)
  }
  sum(is.na(x) & is.na(y))
}

# Call both_na on x and y
both_na(x, y)
## Error: x and y must have the same length

A different kind of surprise: side effects

Side effects describe the things that happen when you run a function that alters the state of your R session. If foo() is a function with no side effects (a.k.a. pure), then when we run x <- foo(), the only change we expect is that the variable x now has a new value. No other variables in the global environment should be changed or created, no output should be printed, no plots displayed, no files saved, no options changed. We know exactly the changes to the state of the session just be reading the call to the function.

Can you identify which of these functions doesn’t have side effects?

show_missings <- function(x) {
  n <- sum(is.na(x))
  cat("Missing values: ", n, "\n", sep = "")
}

replace_missings <- function(x, replacement) {
  x[is.na(x)] <- replacement
}

plot_missings <- function(x) {
  plot(seq_along(x), is.na(x))
  x
}

exclude_missings <- function(x) {
  options(na.action = "na.exclude")
}
Possible Answers
  • show_missings()
  • replace_missings()
  • plot_missings()
  • exclude_missings()

Correct, great job! Of course functions with side effects are crucial for data analysis. You need to be aware of them, and deliberate in their usage. It’s ok to use them if the side effect is desired, but don’t surprise users with unexpected side effects.

Video: Unstable types

sapply is another common culprit

Exercise

sapply() is another common offender returning unstable types. The type of output returned from sapply() depends on the type of input.

Consider the following data frame and two calls to sapply():

df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

A <- sapply(df[1:4], class)
B <- sapply(df[3:4], class)

What type of objects will be A and B be?

Possible Answers
  • A and B will both be character vectors.
  • A and B will both be lists.
  • A will be a list, B will be a character vector
  • A, will be a list, B will be a character matrix

Using purrr solves the problem

Exercise

This unpredictable behavior is a sign that you shouldn’t rely on sapply() inside your own functions.

So, what do you do? Use alternate functions that are type consistent! And you already know a whole set: the map() functions in purrr.

In this example, when we call class() on the columns of the data frame we are expecting character output, so our function of choice should be: map_chr():

df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

A <- map_chr(df[1:4], class)
B <- map_chr(df[3:4], class)

Except that gives us errors. This is a good thing! It alerts us that our assumption (that class() would return purely character output) is wrong.

Let’s look at a couple of solutions. First, we could use map() instead of map_chr(). Our result will always be a list, no matter the input.

Instructions

The first two chunks give some sapply() calls, and demonstrate the type inconsistency by calling str() on each result.

  • Define X, Y and Z using map() instead of sapply().
  • Call str() on your X, Y and Z to demonstrate type consistency of map().
df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

# sapply calls
A <- sapply(df[1:4], class) 
B <- sapply(df[3:4], class)
C <- sapply(df[1:2], class)

# Demonstrate type inconsistency
str(A)
## List of 4
##  $ a: chr "integer"
##  $ b: chr "numeric"
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(B)
##  chr [1:2, 1:2] "POSIXct" "POSIXt" "ordered" "factor"
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "y" "z"
str(C)
##  Named chr [1:2] "integer" "numeric"
##  - attr(*, "names")= chr [1:2] "a" "b"
# Use map() to define X, Y and Z
X <- map(df[1:4], class) 
Y <- map(df[3:4], class)
Z <- map(df[1:2], class)

# Use str() to check type inconsistency
str(X)
## List of 4
##  $ a: chr "integer"
##  $ b: chr "numeric"
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(Y)
## List of 2
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(Z)
## List of 2
##  $ a: chr "integer"
##  $ b: chr "numeric"

A type consistent solution

Exercise

If we wrap our solution into a function, we can be confident that this function will always return a list because we’ve used a type consistent function, map():

col_classes <- function(df) {
  map(df, class)
}

But what if you wanted this function to always return a character string?

One option would be to decide what should happen if class() returns something longer than length 1. For example, we might simply take the first element of the vector returned by class().

Instructions

  • Assign the body of our previous function to the variable class_list.
  • Use map_chr() along with the numeric subsetting shortcut, to extract the first element from every item in class_list.
  • Run the final three lines to verify our new function always returns a character vector.
col_classes <- function(df) {
  # Assign list output to class_list
  class_list <- map(df, class)
  #Use map_chr() to extract first element in class_list
  map_chr(class_list, 1)
}

# Check that our new function is type consistent
df %>% col_classes() %>% str()
##  Named chr [1:4] "integer" "numeric" "POSIXct" "ordered"
##  - attr(*, "names")= chr [1:4] "a" "b" "y" "z"
df[3:4] %>% col_classes() %>% str()
##  Named chr [1:2] "POSIXct" "ordered"
##  - attr(*, "names")= chr [1:2] "y" "z"
df[1:2] %>% col_classes() %>% str()
##  Named chr [1:2] "integer" "numeric"
##  - attr(*, "names")= chr [1:2] "a" "b"

Or fail early if something goes wrong

Exercise

Another option would be to simply fail. We could rely on map_chr()’s type consistency to fail for us:

col_classes <- function(df) {
  map_chr(df, class)
}

df %>% col_classes() %>% str()

Or, check the condition ourselves and return an informative error message. We’ll implement this approach in this exercise.

As you write more functions, you’ll find you often come across this tension between implementing a function that does something sensible when something surprising happens, or simply fails when something surprising happens. Our recommendation is to fail when you are writing functions that you’ll use behind the scenes for programming and to do something sensible when writing functions for users to use interactively.

(And by the way, flatten_chr() is yet another useful function in purrr. It takes a list and removes its hierarchy. The suffix _chr indicates that this is another type consistent function, and will either return a character string or an error message.)

Instructions

  • Replace condition with a logical statement that uses any() and map_dbl() to check if any element in class_list has length greater than 1.
  • Run the final three lines to verify an informative error is thrown.
col_classes <- function(df) {
  class_list <- map(df, class)
  
  # Add a check that no element of class_list has length > 1
  if (any(map_dbl(class_list, length) > 1)) {
    stop("Some columns have more than one class", call. = FALSE)
  }
  
  # Use flatten_chr() to return a character vector
  flatten_chr(class_list)
}

# Check that our new function is type consistent
df %>% col_classes() %>% str()
## Error: Some columns have more than one class
df[3:4] %>% col_classes() %>% str()
## Error: Some columns have more than one class
df[1:2] %>% col_classes() %>% str()
##  chr [1:2] "integer" "numeric"

Video: Non-standard evaluation

Programming with NSE functions

Exercise

Let’s take a look at a function that uses the non-standard evaluation (NSE) function filter() from the dplyr package:

big_x <- function(df, threshold) {
  dplyr::filter(df, x > threshold)
}

This big_x() function attempts to return all rows in df where the x column exceeds a certain threshold. Let’s get a feel for how it might be used.

Instructions

We’ve placed diamonds_sub, a 20 row subset of the diamonds data from the ggplot2 package in your workspace. Use big_x() to find all rows in diamonds_sub where the x column is greater than 7.

# Use big_x() to find rows in diamonds_sub where x > 7
big_x(diamonds_sub, 7)
## Warning: package 'bindrcpp' was built under R version 3.4.4
## # A tibble: 2 x 10
##   carat   cut color clarity depth table price     x     y     z
##   <dbl> <int> <int>   <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  1.35     3     4       4  60.9    54 10471  7.18  7.15  4.36
## 2  1.63     3     3       1  62      55  7229  7.57  7.5   4.68

When things go wrong

Exercise

Now, let’s see how this function might fail. There are two instances in which the non-standard evaluation of filter() could cause surprising results:

  1. The x column doesn’t exist in df.
  2. There is a threshold column in df.

Let’s illustrate these failures. In each case we’ll use big_x() in the same way as the previous exercise, so we should expect the same output. However, not only do we get unexpected outputs, there is no indication (i.e. error message) that lets us know something might have gone wrong.

Instructions

  • Create a variable x and give it the value 1.
  • Use big_x() to find all rows in diamonds_sub where the x column is greater than 7.
  • Create a threshold column in diamonds_sub with the value 100.
  • Use big_x() to find all rows in diamonds_sub where the x column is greater than 7.
# Remove the x column from diamonds
diamonds_sub$x <- NULL

# Create variable x with value 1
x <- 1

# Use big_x() to find rows in diamonds_sub where x > 7
big_x(diamonds_sub, 7)
## # A tibble: 0 x 9
## # ... with 9 variables: carat <dbl>, cut <int>, color <int>,
## #   clarity <int>, depth <dbl>, table <dbl>, price <int>, y <dbl>, z <dbl>
# Create a threshold column with value 100
diamonds_sub$threshold <- 100

# Use big_x() to find rows in diamonds_sub where x > 7
big_x(diamonds_sub, 7)
## # A tibble: 0 x 10
## # ... with 10 variables: carat <dbl>, cut <int>, color <int>,
## #   clarity <int>, depth <dbl>, table <dbl>, price <int>, y <dbl>,
## #   z <dbl>, threshold <dbl>

What to do?

Exercise

To avoid the problems caused by non-standard evaluation functions, you could avoid using them. In our example, we could achieve the same results by using standard subsetting (i.e. []) instead of filter(). For more insight into dealing with NSE and how to write your own non-standard evaluation functions, we recommend reading Hadley’s vignette on the topic. Also, programming with the NSE functions in dplyr will be easier in a future version.

If you do need to use non-standard evaluation functions, it’s up to you to provide protection against the problem cases. That means you need to know what the problem cases are, to check for them, and to fail explicitly.

To see what that might look like, let’s rewrite big_x() to fail for our problem cases.

Instructions

Write a check for each of the following:

  • If x is not in names(df), stop with the message "df must contain variable called x".
  • If threshold is in names(df), stop with the message "df must not contain variable called threshold".

Remember to use the argument call. = FALSE in each call to stop() so that the call is not a part of the error message.

big_x <- function(df, threshold) {
  # Write a check for x not being in df
  if (!"x" %in% names(df)) {
    stop("df must contain variable called x", call. = FALSE)
  }
  
  # Write a check for threshold being in df
  if ("threshold" %in% names (df)) {
    stop("df must not contain variable called threshold", call. = FALSE)
  }
  
  dplyr::filter(df, x > threshold)
}

Video: Hidden arguments

A hidden dependence

Exercise

A classic example of a hidden dependence is the stringsAsFactors argument to the read.csv() function (and a few other data frame functions.)

When you see the following code, you don’t know exactly what the result will be:

pools <- read.csv("swimming_pools.csv")

That’s because if the argument stringsAsFactors isn’t specified, it inherits its value from getOption("stringsAsFactors"), a global option that a user may change.

Just to prove that this is the case, let’s illustrate the problem.

Instructions

  • Use read.csv() as above to read in swimming_pools.csv and save it to the variable pools.
  • Examine the structure of pools and notice how the columns Name and Address are factors.
  • Change the global stringsAsFactors option to FALSE: options(stringsAsFactors = FALSE).
  • Use read.csv() as above to read in the same CSV file to the variable pools2.
  • Examine the structure of pools2, and notice how the columns Name and Address are now characters.
# Read in the swimming_pools.csv to pools
url_csv <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/swimming_pools.csv"
pools <- read.csv(url_csv)

# Examine the structure of pools
str(pools)
## 'data.frame':    20 obs. of  4 variables:
##  $ Name     : Factor w/ 20 levels "Acacia Ridge Leisure Centre",..: 1 2 3 4 5 6 19 7 8 9 ...
##  $ Address  : Factor w/ 20 levels "1 Fairlead Crescent, Manly",..: 5 20 18 10 9 11 6 15 12 17 ...
##  $ Latitude : num  -27.6 -27.6 -27.6 -27.5 -27.4 ...
##  $ Longitude: num  153 153 153 153 153 ...
# Change the global stringsAsFactors option to FALSE
options(stringsAsFactors = FALSE)

# Read in the swimming_pools.csv to pools2
pools2 <- read.csv(url_csv)

# Examine the structure of pools2
str(pools2)
## 'data.frame':    20 obs. of  4 variables:
##  $ Name     : chr  "Acacia Ridge Leisure Centre" "Bellbowrie Pool" "Carole Park" "Centenary Pool (inner City)" ...
##  $ Address  : chr  "1391 Beaudesert Road, Acacia Ridge" "Sugarwood Street, Bellbowrie" "Cnr Boundary Road and Waterford Road Wacol" "400 Gregory Terrace, Spring Hill" ...
##  $ Latitude : num  -27.6 -27.6 -27.6 -27.5 -27.4 ...
##  $ Longitude: num  153 153 153 153 153 ...

Legitimate use of options

Exercise

In general, you want to avoid having the return value of your own functions depend on any global options. That way, you and others can reason about your functions without needing to know the current state of the options.

It is, however, okay to have side effects of a function depend on global options. For example, the print() function uses getOption("digits") as the default for the digits argument. This gives users some control over how results are displayed, but doesn’t change the underlying computation.

Let’s take a look at an example function that uses a global default sensibly. The print.lm() function has the options digits with default max(3, getOption("digits") - 3).

Instructions

We’ve fit a regression model of fuel efficiency on weight using the mtcars dataset.

  • Use summary() to take a look at the fitted regression model. Pay particular attention to number of decimal places reported.
  • Set the global digits option to 2: options(digits = 2).
  • Take another look at the fitted model using summary(). Notice the number of decimal places has changed, but there is no change to the underlying fit object.
# Fit a regression model
fit <- lm(mpg ~ wt, data = mtcars)

# Look at the summary of the model
summary(fit)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
# Set the global digits option to 2
options(digits = 2)

# Take another look at the summary
summary(fit)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.543 -2.365 -0.125  1.410  6.873 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   37.285      1.878   19.86  < 2e-16 ***
## wt            -5.344      0.559   -9.56  1.3e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3 on 30 degrees of freedom
## Multiple R-squared:  0.753,  Adjusted R-squared:  0.745 
## F-statistic: 91.4 on 1 and 30 DF,  p-value: 1.29e-10

Video: Wrap-up