1. Advanced Looping — `lapply()` and `sapply()` (chapter 4 Data Camp’s Intermediate R)

Motivation

Here is a function to convert fahrenheit to celsius

to_celsius <- function(x) {
  (x-32)*5/9
}

The function to_celsius happens to be a vectorized function:

to_celsius(c(32, 40, 50, 60, 70))

## [1]  0.000000  4.444444 10.000000 15.555556 21.111111

Here is another example of a vectorized function:

square_me <- function(vec) vec^2
square_me(c(1,2,3))

## [1] 1 4 9

What happens in this situation?

# trying to_celsius() on a list
to_celsius(list(32, 40, 50, 60, 70))

Outputs: Error in x - 32 : non-numeric argument to binary operator

to_celsius() does not work with a list.

One solution is to use a for loop:

temps_farhrenheit <- list(c(32, 40, 50, 60, 70))
temps_celsius=c()

for(temp in temps_farhrenheit){
  temps_celsius <- c(temps_celsius,to_celsius(temp))
}
temps_celsius

## [1]  0.000000  4.444444 10.000000 15.555556 21.111111

R provides a set of functions to “vectorize” functions over the elements of lists.

lapply()
sapply()
vapply()

These functions allow us to avoid writing loops and creates faster more readable code.

The simplest apply function is lapply()

lapply()

lapply() stands for list apply. It takes a list or vector and a function as inputs and returns a list.

example:

Suppose we want to know the class of the elements of the following list.

nyc <-  list(pop = 8404837, boroughs =c ("Manhattan", "Bronx", "Brooklyn", "Queens", "Staten Island"), capital = FALSE)

Instead of writing a for loop to do this you can use lapply()

nyc %>% lapply(class)

## $pop
## [1] "numeric"
## 
## $boroughs
## [1] "character"
## 
## $capital
## [1] "logical"

The output of lapply is a list.

example:

players <- list(
warriors = c('kurry', 'iguodala', 'thompson', 'green'),
cavaliers = c('james', 'shumpert', 'thompson'),
rockets = c('harden', 'howard')
)

players %>% lapply(length)

## $warriors
## [1] 4
## 
## $cavaliers
## [1] 3
## 
## $rockets
## [1] 2

It applies the function to length() each element of the list. The output is another list.

Task for you

Here is a list of three vectors of numbers from the normal distribution.

mylist <- list(rnorm(3), rnorm(3), rnorm(5))
mylist

## [[1]]
## [1] -0.8284456 -0.1887556  0.5683757
## 
## [[2]]
## [1]  1.54101435 -0.47146725 -0.08813881
## 
## [[3]]
## [1]  1.2232356  0.3873549  2.3407349  0.3201347 -0.5176843

Find the minimum of each element in mylist.

#using lapply

min_list <- mylist %>% lapply(min) %>% unlist()   #unlist() converts a list to a vector
min_list

## [1] -0.8284456 -0.4714673 -0.5176843

What about functions that take more than one argument?

paste() concatenates elements of a vector connecting them with the value of the arguement collapse

paste(c("stat", 133, "go"), collapse= "-")

## [1] "stat-133-go"

after the function argument in lapply we put the extra arguements of the functions.

players %>% lapply(paste, collapse = '-')

## $warriors
## [1] "kurry-iguodala-thompson-green"
## 
## $cavaliers
## [1] "james-shumpert-thompson"
## 
## $rockets
## [1] "harden-howard"

You can use your own functions in lapply()

num_chars <- function(x) {
nchar(x)
}
lapply(players, num_chars)

## $warriors
## [1] 5 8 8 5
## 
## $cavaliers
## [1] 5 8 8
## 
## $rockets
## [1] 6 6

num_chars1 <- function(x,y) {
nchar(x) + y
}
players %>% lapply( num_chars1, 3)

## $warriors
## [1]  8 11 11  8
## 
## $cavaliers
## [1]  8 11 11
## 
## $rockets
## [1] 9 9

You can define a function with no name (an “anonymous” function)

1:3 %>% lapply(function(x) x^2)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9

another example:

the passte() function connects the string “mr” with each of the Warrior player’s last name.

paste("mr",c('kurry', 'iguodala', 'thompson', 'green'))

## [1] "mr kurry"    "mr iguodala" "mr thompson" "mr green"

To apply paste() to the list of players:

players %>% lapply( function(x) paste("mr",x))

## $warriors
## [1] "mr kurry"    "mr iguodala" "mr thompson" "mr green"   
## 
## $cavaliers
## [1] "mr james"    "mr shumpert" "mr thompson"
## 
## $rockets
## [1] "mr harden" "mr howard"

Task for you

nchar(c("adam","jim"))

## [1] 4 3

Use lapply to add two to the number of characters in each of player’s name in players

players %>% lapply(function(x) nchar(x) +2)

## $warriors
## [1]  7 10 10  7
## 
## $cavaliers
## [1]  7 10 10
## 
## $rockets
## [1] 8 8

Remember that a data.frame and a matrix is internally stored as a list.

df <- data.frame(
  name = c('Luke', 'Leia', 'R2-D2', 'C-3PO'),
  gender = c('male', 'female', 'male', 'male'),
  height = c(1.72, 1.50, 0.96, 1.67),
  weight = c(77, 49, 32, 75)
)

df %>% lapply(class)

## $name
## [1] "factor"
## 
## $gender
## [1] "factor"
## 
## $height
## [1] "numeric"
## 
## $weight
## [1] "numeric"

sapply

sapply() is a modified version of lapply().

`sapply() stands for simplified apply and will output the result as an array if possilbe.

1:3 %>% sapply(function(x) x^2)

## [1] 1 4 9

Here we output a 1 dimensional array (i.e. a vector) Notice this is the same as

1:3 %>% lapply(function(x) x^2) %>% unlist()

## [1] 1 4 9

We have seen examples where the list output by lapply() have elements that are vectors of different size. It isn’t possible to coerce this to be an array since all the rows of an array must be the same size. However, if the output of lapply() has vectors of equal length, we can display our results as an array. `

for example:

first_and_last <- function(name){
    name <-  gsub(" ","",name)
    letters <-  strsplit(name,split = "")[[1]]
    c(first=min(letters), last=max(letters))
}
 
first_and_last("New York")

## first  last 
##   "e"   "Y"

now lets apply this function to the following vector of cities.

cities <- c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town")

cities %>% sapply(first_and_last)

	New York	Paris	London	Tokyo	Rio de Janeiro	Cape Town
first	e	a	d	k	a	a
last	Y	s	o	y	R	w

Notice that the output is a two dimensional array with meaningful row and column names.

In data camp you will learn about vapply() which has the same output as sapply() but is faster to run because you specify size of the array you are outputting ahead of time.

Lec17

Today

1. Advanced Looping — `lapply()` and `sapply()` (chapter 4 Data Camp’s Intermediate R)

Motivation

lapply()

Task for you

Task for you

sapply

Lec17

Today

1. Advanced Looping — lapply() and sapply() (chapter 4 Data Camp’s Intermediate R)

Motivation

lapply()

Task for you

Task for you

sapply

1. Advanced Looping — `lapply()` and `sapply()` (chapter 4 Data Camp’s Intermediate R)