Source file ⇒ lec17.Rmd
lapply()
and sapply()
(chapter 4 Data Camp’s Intermediate R)Here is a function to convert fahrenheit to celsius
to_celsius <- function(x) {
(x-32)*5/9
}
The function to_celsius
happens to be a vectorized function:
to_celsius(c(32, 40, 50, 60, 70))
## [1] 0.000000 4.444444 10.000000 15.555556 21.111111
Here is another example of a vectorized function:
square_me <- function(vec) vec^2
square_me(c(1,2,3))
## [1] 1 4 9
What happens in this situation?
# trying to_celsius() on a list
to_celsius(list(32, 40, 50, 60, 70))
Outputs: Error in x - 32 : non-numeric argument to binary operator
to_celsius()
does not work with a list.
One solution is to use a for loop:
temps_farhrenheit <- list(c(32, 40, 50, 60, 70))
temps_celsius=c()
for(temp in temps_farhrenheit){
temps_celsius <- c(temps_celsius,to_celsius(temp))
}
temps_celsius
## [1] 0.000000 4.444444 10.000000 15.555556 21.111111
R provides a set of functions to “vectorize” functions over the elements of lists.
lapply()
sapply()
vapply()
These functions allow us to avoid writing loops and creates faster more readable code.
The simplest apply function is lapply()
lapply() stands for list apply. It takes a list or vector and a function as inputs and returns a list.
example:
Suppose we want to know the class of the elements of the following list.
nyc <- list(pop = 8404837, boroughs =c ("Manhattan", "Bronx", "Brooklyn", "Queens", "Staten Island"), capital = FALSE)
Instead of writing a for loop to do this you can use lapply()
nyc %>% lapply(class)
## $pop
## [1] "numeric"
##
## $boroughs
## [1] "character"
##
## $capital
## [1] "logical"
The output of lapply is a list.
example:
players <- list(
warriors = c('kurry', 'iguodala', 'thompson', 'green'),
cavaliers = c('james', 'shumpert', 'thompson'),
rockets = c('harden', 'howard')
)
players %>% lapply(length)
## $warriors
## [1] 4
##
## $cavaliers
## [1] 3
##
## $rockets
## [1] 2
It applies the function to length() each element of the list. The output is another list.
Here is a list of three vectors of numbers from the normal distribution.
mylist <- list(rnorm(3), rnorm(3), rnorm(5))
mylist
## [[1]]
## [1] -0.8284456 -0.1887556 0.5683757
##
## [[2]]
## [1] 1.54101435 -0.47146725 -0.08813881
##
## [[3]]
## [1] 1.2232356 0.3873549 2.3407349 0.3201347 -0.5176843
Find the minimum of each element in mylist.
#using lapply
min_list <- mylist %>% lapply(min) %>% unlist() #unlist() converts a list to a vector
min_list
## [1] -0.8284456 -0.4714673 -0.5176843
What about functions that take more than one argument?
paste()
concatenates elements of a vector connecting them with the value of the arguement collapse
paste(c("stat", 133, "go"), collapse= "-")
## [1] "stat-133-go"
after the function argument in lapply we put the extra arguements of the functions.
players %>% lapply(paste, collapse = '-')
## $warriors
## [1] "kurry-iguodala-thompson-green"
##
## $cavaliers
## [1] "james-shumpert-thompson"
##
## $rockets
## [1] "harden-howard"
You can use your own functions in lapply()
num_chars <- function(x) {
nchar(x)
}
lapply(players, num_chars)
## $warriors
## [1] 5 8 8 5
##
## $cavaliers
## [1] 5 8 8
##
## $rockets
## [1] 6 6
num_chars1 <- function(x,y) {
nchar(x) + y
}
players %>% lapply( num_chars1, 3)
## $warriors
## [1] 8 11 11 8
##
## $cavaliers
## [1] 8 11 11
##
## $rockets
## [1] 9 9
You can define a function with no name (an “anonymous” function)
1:3 %>% lapply(function(x) x^2)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
another example:
the passte()
function connects the string “mr” with each of the Warrior player’s last name.
paste("mr",c('kurry', 'iguodala', 'thompson', 'green'))
## [1] "mr kurry" "mr iguodala" "mr thompson" "mr green"
To apply paste()
to the list of players:
players %>% lapply( function(x) paste("mr",x))
## $warriors
## [1] "mr kurry" "mr iguodala" "mr thompson" "mr green"
##
## $cavaliers
## [1] "mr james" "mr shumpert" "mr thompson"
##
## $rockets
## [1] "mr harden" "mr howard"
nchar(c("adam","jim"))
## [1] 4 3
Use lapply to add two to the number of characters in each of player’s name in players
players %>% lapply(function(x) nchar(x) +2)
## $warriors
## [1] 7 10 10 7
##
## $cavaliers
## [1] 7 10 10
##
## $rockets
## [1] 8 8
Remember that a data.frame and a matrix is internally stored as a list.
df <- data.frame(
name = c('Luke', 'Leia', 'R2-D2', 'C-3PO'),
gender = c('male', 'female', 'male', 'male'),
height = c(1.72, 1.50, 0.96, 1.67),
weight = c(77, 49, 32, 75)
)
df %>% lapply(class)
## $name
## [1] "factor"
##
## $gender
## [1] "factor"
##
## $height
## [1] "numeric"
##
## $weight
## [1] "numeric"
sapply()
is a modified version of lapply()
.
`sapply() stands for simplified apply and will output the result as an array if possilbe.
1:3 %>% sapply(function(x) x^2)
## [1] 1 4 9
Here we output a 1 dimensional array (i.e. a vector) Notice this is the same as
1:3 %>% lapply(function(x) x^2) %>% unlist()
## [1] 1 4 9
We have seen examples where the list output by lapply()
have elements that are vectors of different size. It isn’t possible to coerce this to be an array since all the rows of an array must be the same size. However, if the output of lapply() has vectors of equal length, we can display our results as an array. `
for example:
first_and_last <- function(name){
name <- gsub(" ","",name)
letters <- strsplit(name,split = "")[[1]]
c(first=min(letters), last=max(letters))
}
first_and_last("New York")
## first last
## "e" "Y"
now lets apply this function to the following vector of cities.
cities <- c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town")
cities %>% sapply(first_and_last)
New York | Paris | London | Tokyo | Rio de Janeiro | Cape Town | |
---|---|---|---|---|---|---|
first | e | a | d | k | a | a |
last | Y | s | o | y | R | w |
Notice that the output is a two dimensional array with meaningful row and column names.
In data camp you will learn about vapply()
which has the same output as sapply()
but is faster to run because you specify size of the array you are outputting ahead of time.