Vectorization is a process unique to R and its functions. A vectorized function works not just on a single value, but on a whole vector of values at the same time. So instead of looping over all values of the vector and applying a function within the loop, vectorization makes that unnecessary and can drastically simplify your code to a single line of code.
Common Vectorized Functions in R
lapply( ) - loop over a list and evaluate a function on each element
sapply( ) - same as lapply( ) but try to simplify the result
apply( ) - apply a function over the margins of an array
tapply( ) - apply a function over subsets of a vector
mapply( ) - multivariate version of lapply
split( ) - auxiliary function used with lapply( ) and sapply( ) because it splits objects into subpieces
lapply( )
lapply( ) loops over a list and evaluate a function on each element. lapply( ) always returns a list, regardless of the class of the input.
str(lapply)
## function (X, FUN, ...)
X - the list we would like to apply some function to
FUN - the function we would like to apply to each element in the list
… - specify any other arguments to send to the function
head(lapply) # R's source code written in C
##
## 1 function (X, FUN, ...)
## 2 {
## 3 FUN <- match.fun(FUN)
## 4 if (!is.vector(X) || is.object(X))
## 5 X <- as.list(X)
## 6 .Internal(lapply(X, FUN))
If you don’t input a list, lapply will convert your object into a list according to its source code. To learn more about R’s C interface check out this site: http://adv-r.had.co.nz/C-interface.html.
Let’s go through some examples. Throughout these examples, we’ll be using rnorm( ) to generate random numbers from a defined normal distribution and runif( ) to generate uniform random variables. So what arguments does rnorm( ) and runif( ) take?
str(rnorm) # sample size, mean of sample to be simulated, etc.
## function (n, mean = 0, sd = 1)
str(runif) # sample size, and the lower and upper limits of the distribution
## function (n, min = 0, max = 1)
Example 1. Take the mean of each element in a list.
x <- list(a = 1:5, b = rnorm(10)) # list w/ 2 elements.
l <- lapply(x, mean)
l # new values assembled in a new list
## $a
## [1] 3
##
## $b
## [1] 0.2953307
Example 2. Take the mean of each element in a list.
x <- list(a=1:4, b=rnorm(10), c=rnorm(20,1), d= rnorm(100, 5))
lapply(x, mean)
## $a
## [1] 2.5
##
## $b
## [1] 0.6096763
##
## $c
## [1] 0.9237171
##
## $d
## [1] 5.146349
Example 3. Apply a function to a vector in lapply( ).
You can use lapply( ) to evaluate a function multiple times each with a different argument. Below, is an example where I call the runif( ) function (to generate uniformly distributed random variables) four times, each time generating a different number of random numbers.
x <- 1:4
#mean(x)
lapply(x, runif)
## [[1]]
## [1] 0.9027863
##
## [[2]]
## [1] 0.9363437 0.3141462
##
## [[3]]
## [1] 0.6517832 0.2041103 0.4341976
##
## [[4]]
## [1] 0.6625782 0.7502077 0.7816101 0.5590214
Example 4. Add additional FUN arguments.
x <- 1:4
lapply(x, runif, min=0, max=10)
## [[1]]
## [1] 1.604497
##
## [[2]]
## [1] 6.821681 8.059541
##
## [[3]]
## [1] 0.4648676 9.5136075 4.6491326
##
## [[4]]
## [1] 0.1959291 0.1552903 1.7185262 1.5683099
So now, instead of the random numbers being between 0 and 1 (the default), the are all between 0 and 10.
Example 5. Anonymous functions.
Anonymous functions - don’t have a name, but can be created within the lapply (but they will not exist outside of lapply( )).
For example, let’s make an anonymous function for extracting the first column of each matrix.
x <- list(a = matrix(1:4, 2, 2), b = matrix( 1:6, 3, 2))
lapply(x, function(elt) elt[,1])
## $a
## [1] 1 2
##
## $b
## [1] 1 2 3
sapply( )
sapply( ) will try to simplify the result of lapply if possible.
If the result is a list where every element is length 1, then a vector is returned
If the result is a list where every element is a vector of the same length (>1), a matrix is returned.
If it can’t figure things out, a list is returned.
Example 1. Take the mean of each element in a list.
x <- list(a=1:4, b=rnorm(10), c=rnorm(20,1), d=rnorm(100,5))
lapply(x,mean)
## $a
## [1] 2.5
##
## $b
## [1] -0.4353834
##
## $c
## [1] 0.5798263
##
## $d
## [1] 4.942518
sapply(x,mean)
## a b c d
## 2.5000000 -0.4353834 0.5798263 4.9425177
split( )
The benefit of combining split( ) with lapply( ) or sapply( ) is to take a data structure, split it into subsets defined by another variable, and apply a function over those subsets.
str(split)
## function (x, f, drop = FALSE, ...)
x is a vector (or list) or data frame
f is a factor (or coerced to one) or a list of factors
drop indicates whether empty factors levels should be dropped
Example 1
Let’s use gl( ) function to “generate levels” in a factor variable. An R factor is used to store categorical data as levels. It can store both character and integer types of data.
str(gl)
## function (n, k, length = n * k, labels = seq_len(n), ordered = FALSE)
?gl
x <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3,10)
split(x,f)
## $`1`
## [1] -0.59551470 0.02627083 -0.07124994 -0.49352066 0.60030041 -0.33435199
## [7] 1.02329748 0.24600562 0.16653438 1.88206784
##
## $`2`
## [1] 0.9134681 0.2727243 0.9025976 0.5090428 0.8845266 0.2132206 0.1856311
## [8] 0.8944454 0.4285311 0.6840332
##
## $`3`
## [1] -0.1115023 1.4412868 0.1431598 0.9567117 1.9758419 -0.3024723
## [7] 2.9196340 2.9653031 -0.4777817 2.6517442
lapply(split(x,f), mean)
## $`1`
## [1] 0.2449839
##
## $`2`
## [1] 0.5888221
##
## $`3`
## [1] 1.216193
apply ( )
apply( ) applies a function over the margins of an array.
str(apply)
## function (X, MARGIN, FUN, ...)
X - the object we would like to apply some function to
MARGIN - specifies if the function is applied to rows or columns, 1 = row and 2 = column.
FUN - the function we would like to apply
… - specify any other arguments to send to the function
Example 1. Calculating the mean price of each of the stocks over the 10 days.
d <- as.data.frame(matrix(
c(185.74, 184.26, 162.21, 159.04, 164.87,
162.72, 157.89, 159.49, 150.22, 151.02,
1.47, 1.56, 1.39, 1.43, 1.42,
1.36, NA, 1.43, 1.57, 1.54,
1605, 1580, 1490, 1520, 1550,
1525, 1495, 1485, 1470, 1510,
95.05, 97.49, 88.57, 85.55, 92.04,
91.70, 89.88, 93.17, 90.12, 92.14), ncol=4, nrow=10,
dimnames = list(c("Day1","Day2","Day3","Day4","Day5",
"Day6","Day7","Day8","Day9","Day10"),
c("Stock1", "Stock2", "Stock3", "Stock4"))))
d
## Stock1 Stock2 Stock3 Stock4
## Day1 185.74 1.47 1605 95.05
## Day2 184.26 1.56 1580 97.49
## Day3 162.21 1.39 1490 88.57
## Day4 159.04 1.43 1520 85.55
## Day5 164.87 1.42 1550 92.04
## Day6 162.72 1.36 1525 91.70
## Day7 157.89 NA 1495 89.88
## Day8 159.49 1.43 1485 93.17
## Day9 150.22 1.57 1470 90.12
## Day10 151.02 1.54 1510 92.14
AVG <- apply(X=d, MARGIN=2, FUN=mean)
AVG <- apply(X=d, MARGIN=2, FUN=mean, na.rm=TRUE)
AVG
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
colMeans(d, na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
Example 2. Find max of each stock.
apply(X=d, MARGIN=2, FUN=max, na.rm=TRUE) # find max
## Stock1 Stock2 Stock3 Stock4
## 185.74 1.57 1605.00 97.49
Example 3. Calculate the 20th and 80th percentile of each stock.
Let R know which percentiles to calculate
apply(X=d, MARGIN=2, FUN=quantile, probs=c(0.2, 0.8), na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 20% 156.516 1.408 1489 89.618
## 80% 168.748 1.548 1556 93.546
Example 4. Plot the data.
par(mfrow=c(2,2))
apply(X=d, MARGIN=2, FUN=plot, type="l", main="stock", ylab= "Price", xlab="Day")
## NULL
Example 5. Sum each row.
apply(X=d, MARGIN=1, FUN=sum, na.rm=TRUE)
## Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8 Day9 Day10
## 1887.26 1863.31 1742.17 1766.02 1808.33 1780.78 1742.77 1739.09 1711.91 1754.70
Example 6. Plot market trends.
plot(apply(X=d, MARGIN=1, FUN=sum, na.rm=TRUE), type="l", ylab= "Total Market Value", xlab="Day", main="Market Trend")
points(apply(d, 1, FUN=sum, na.rm=TRUE), pch=16, col="blue")
tapply ( )
tapply( ) can be used to apply a function to subsets of a variable or vector. The tapply( ) function is a specialized loop/subsetting function, although it is more efficient than the simple use of square brackets or a “subset” function. The tapply function allows the user to divide a variable into multiple groups based on another variable(s) used to define the groups/subsets, and then apply a function to each of the groups/subsets.
LungCapData <- read.table("LungCapData.txt", header=TRUE)
str(LungCapData)
## 'data.frame': 725 obs. of 6 variables:
## $ LungCap : num 6.47 10.12 9.55 11.12 4.8 ...
## $ Age : int 6 18 16 14 5 11 8 11 15 11 ...
## $ Height : num 62.1 74.7 69.7 71 56.9 58.7 63.3 70.4 70.5 59.2 ...
## $ Smoke : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "female","male": 2 1 1 2 2 1 2 2 2 2 ...
## $ Caesarean: Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 2 1 1 1 ...
attach(LungCapData)
attach( ) attaches a data frame (or list) to the search path, so it becomes possible to refer to the variables in the data frame by their names alone, rather than as components of the data frame (e.g., in the example above, you would use Age rather than d$Age).
str(tapply)
## function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
X = an atomic object, typically a vector
INDEX = grouping variable same length as X and used to create subsets of the data
FUN = function
… = additional arguments need to apply to the function
simplify= TRUE means to simplify results in TRUE
Example 1. Calculate mean age of smokers and non-smokers seperately.
tapply(X=Age, INDEX=Smoke, FUN=mean, na.rm=T)
## no yes
## 12.03549 14.77922
tapply(X=Age, INDEX=Smoke, FUN=mean, na.rm=T, simplify=FALSE) # returns a list format
## $no
## [1] 12.03549
##
## $yes
## [1] 14.77922
What does this look like with square brackets?
mean(Age[Smoke=="no"])
## [1] 12.03549
mean(Age[Smoke=="yes"])
## [1] 14.77922
Example 2. Apply the summary function to groups.
tapply(Age, Smoke, summary)
## $no
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 9.00 12.00 12.04 15.00 19.00
##
## $yes
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 13.00 15.00 14.78 17.00 19.00
Example 3. Apply the summary quantile to groups.
tapply(Age, Smoke, quantile, probs=c(0.2,0.8))
## $no
## 20% 80%
## 8 16
##
## $yes
## 20% 80%
## 12 17
Example 4. ‘subset’ based on multiple variables/vectors. Calculate the mean Age for Smoker/NonSmoker and male/female.
tapply(X=Age, INDEX=list(Smoke,Gender), FUN=mean, na.rm=T)
## female male
## no 12.12739 11.94910
## yes 14.75000 14.81818
What does this look like with square brackets?
mean(Age[Smoke=="no" & Gender=="female"])
## [1] 12.12739
mean(Age[Smoke=="no" & Gender=="male"])
## [1] 11.9491
mean(Age[Smoke=="yes" & Gender=="female"])
## [1] 14.75
mean(Age[Smoke=="yes" & Gender=="male"])
## [1] 14.81818
‘by’ function in R does the same as ‘tapply’ in R except that it returns results in vector format.
by(Age, list(Smoke, Gender), mean, na.rm=T)
## : no
## : female
## [1] 12.12739
## ------------------------------------------------------------
## : yes
## : female
## [1] 14.75
## ------------------------------------------------------------
## : no
## : male
## [1] 11.9491
## ------------------------------------------------------------
## : yes
## : male
## [1] 14.81818
temp <- by(Age, list(Smoke, Gender), mean, na.rm=T)
temp[4]
## [1] 14.81818
class(temp)
## [1] "by"
temp2 <- c(temp) # convert to a vector
temp2
## [1] 12.12739 14.75000 11.94910 14.81818
class(temp2)
## [1] "numeric"
mapply( )
mapply( ) is a multivariate version of sapply( ) and lapply( ) functions. It is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
For sapply( ), lapply( ), tapply( ), they only apply a function over the elements of a single object. So what happens if you have two lists you want to apply a function over? sapply( ) and lapply( ) can’t be used for that purpose. What you could do then is write a for loop where the for loop will index each of the elements of each list and pass a function through each elemetn in each list.
But mapply( ) can take multiple list arguments and apply a function to the elements in the lists in parallel.
str(mapply)
## function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
FUN is a function to apply
… contains arguments to apply over
MoreArgs is a list of other arguments to FUN
SIMPLIFY indicates whether the result should be simplified
Example 1
list(rep(1,4), rep(2,3), rep(3,2), rep(4,1)) # tedius to type
## [[1]]
## [1] 1 1 1 1
##
## [[2]]
## [1] 2 2 2
##
## [[3]]
## [1] 3 3
##
## [[4]]
## [1] 4
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
##
## [[2]]
## [1] 2 2 2
##
## [[3]]
## [1] 3 3
##
## [[4]]
## [1] 4
Example 2
noise <- function(n, mean, sd) {
set.seed(100)
rnorm(n, mean, sd)
}
noise(5,1,2)
## [1] -0.004384701 1.263062331 0.842165820 2.773569619 1.233942541
noise(1:5, 1:5, 2) # if pass a vector of arguments, this doesn't work correctly
## [1] -0.004384701 2.263062331 2.842165820 5.773569619 5.233942541
mapply(noise, 1:5, 1:5, 2) # this is how it should be:
## [[1]]
## [1] -0.004384701
##
## [[2]]
## [1] 0.9956153 2.2630623
##
## [[3]]
## [1] 1.995615 3.263062 2.842166
##
## [[4]]
## [1] 2.995615 4.263062 3.842166 5.773570
##
## [[5]]
## [1] 3.995615 5.263062 4.842166 6.773570 5.233943
# which is the same as:
list(noise(1,1,2), noise(2,2,2),
noise(3,3,2), noise(4,4,2),
noise(5,5,2))
## [[1]]
## [1] -0.004384701
##
## [[2]]
## [1] 0.9956153 2.2630623
##
## [[3]]
## [1] 1.995615 3.263062 2.842166
##
## [[4]]
## [1] 2.995615 4.263062 3.842166 5.773570
##
## [[5]]
## [1] 3.995615 5.263062 4.842166 6.773570 5.233943
Commmon purrr vectorized functions
Vectorization Continued: Purrr
Map functions are vectorized functions available through the purr library. They are extremely similar to the vectorized functions already available in R, so this will give you more exposure vectorization. In general, map functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input.
Here is also a great site that explores in more detail the functional tools of map functions: https://adv-r.hadley.nz/functionals.html#map. Also here is the purrr cheatsheet: https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf.
library(purrr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
map( ) - just like lapply( ) in that it will loop over an object and evaluate a function on each element of that object. Then, it will return a list. But it is inconvenient to return a list when a simpler data structure would do, so there are four more specific variants:
map_lgl( ), map_int( ), map_dbl( ), and map_chr( ) - returns an atomic vector of the indicated type
map_dfr( ) and map_dfc( ) return a data frame created by row-binding and column-binding respectively. They require dplyr to be installed.
map( )
Now let’s do the example we saw using lapply( ) but for map( ).
Example 1.
x <- list(a = 1:5, b = rnorm(10)) # list w/ 2 elements.
l <- map(x, mean)
l
## $a
## [1] 3
##
## $b
## [1] 0.01139972
Example 2.
values <- 1:10
map(values, function(x) {rnorm(10, x)})
## [[1]]
## [1] 0.97068329 0.61114575 1.51085626 0.08618581 3.31029682 0.56191002
## [7] 1.76406062 1.26196129 1.77340460 0.18562088
##
## [[2]]
## [1] 1.5615494 1.2797784 2.2309445 0.8422705 2.2470760 1.9088864 3.7573756
## [8] 1.8620704 1.8888065 1.3099857
##
## [[3]]
## [1] 2.778206 3.182908 3.417323 4.065402 3.970202 2.898371 4.403203 1.223224
## [9] 3.622867 2.477717
##
## [[4]]
## [1] 5.322231 3.636560 5.319066 4.043779 2.121344 3.552938 2.261402 4.178865
## [9] 5.897466 1.728075
##
## [[5]]
## [1] 5.980464 3.601174 6.824872 6.381299 4.161148 4.738004 4.931156 4.621116
## [9] 7.581959 5.129834
##
## [[6]]
## [1] 5.286975 6.637994 6.201692 5.930083 5.907510 6.448903 4.935644 4.837581
## [9] 7.648522 3.937904
##
## [[7]]
## [1] 7.012750 5.912472 7.270539 8.008452 4.925595 7.896822 6.950004 5.654651
## [9] 5.068788 7.709582
##
## [[8]]
## [1] 7.842095 8.216368 8.817362 9.727176 7.896230 7.442878 9.428301 7.107043
## [9] 6.842429 7.469704
##
## [[9]]
## [1] 11.445683 8.167504 9.413520 7.821317 7.825965 8.667077 10.363114
## [8] 8.530853 9.842876 7.542006
##
## [[10]]
## [1] 9.599694 9.223583 9.630703 11.240101 9.892566 10.172594 10.254601
## [8] 9.385466 8.570785 9.669025
1:10 %>%
map(~ rnorm(10, .x))
## [[1]]
## [1] 1.1283861 2.0181200 0.7444263 0.6974590 2.6151907 0.2262866
## [7] 1.4240024 0.4160530 1.4150357 -0.5452617
##
## [[2]]
## [1] 1.481250 1.720208 3.007457 1.530430 2.297897 1.582206 1.149619 2.689046
## [9] 1.539804 3.348184
##
## [[3]]
## [1] 3.4430714 2.8490738 3.4555489 2.9598453 3.4561210 2.5915750 0.8635061
## [8] 3.1568219 3.6600489 2.0181656
##
## [[4]]
## [1] 2.886356 3.562652 3.483889 4.418996 4.134155 5.034686 5.653503 3.982053
## [9] 3.975797 4.250247
##
## [[5]]
## [1] 4.662875 4.886646 4.901117 5.264087 5.138984 4.757731 5.059031 4.822728
## [9] 5.794680 5.006738
##
## [[6]]
## [1] 5.370210 5.747510 5.309578 6.202542 6.846381 6.632074 6.201414 5.908929
## [9] 6.289484 5.945315
##
## [[7]]
## [1] 4.958150 7.358369 6.627399 8.268309 9.168600 5.760277 7.589874 7.124019
## [9] 6.476292 7.620228
##
## [[8]]
## [1] 8.708222 7.906802 7.704803 6.914185 7.375185 7.766993 7.749183 8.953895
## [9] 7.734027 9.895276
##
## [[9]]
## [1] 8.570009 10.575547 9.161941 7.914547 9.576937 9.028172 8.643297
## [8] 9.852626 9.513365 10.018203
##
## [[10]]
## [1] 8.978521 9.438332 8.987444 6.979186 10.332350 11.240512 10.671350
## [8] 8.669966 9.149420 8.211169
Example 3.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary)
## $`4`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1513 -1.9795 -0.6272 1.9299 5.2523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.571 4.347 9.104 7.77e-06 ***
## wt -5.647 1.850 -3.052 0.0137 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.332 on 9 degrees of freedom
## Multiple R-squared: 0.5086, Adjusted R-squared: 0.454
## F-statistic: 9.316 on 1 and 9 DF, p-value: 0.01374
##
##
## $`6`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280
## -0.1250 0.5840 1.9292 -0.6897 0.3547
## Merc 280C Ferrari Dino
## -1.0453 -1.0080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.409 4.184 6.789 0.00105 **
## wt -2.780 1.335 -2.083 0.09176 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.165 on 5 degrees of freedom
## Multiple R-squared: 0.4645, Adjusted R-squared: 0.3574
## F-statistic: 4.337 on 1 and 5 DF, p-value: 0.09176
##
##
## $`8`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1491 -1.4664 -0.8458 1.5711 3.7619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.8680 3.0055 7.942 4.05e-06 ***
## wt -2.1924 0.7392 -2.966 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.024 on 12 degrees of freedom
## Multiple R-squared: 0.423, Adjusted R-squared: 0.3749
## F-statistic: 8.796 on 1 and 12 DF, p-value: 0.01179
map_lgl( ), map_int( ), map_dbl( ), and map_chr( )
purrr uses the convention that suffixes, like dbl( ), refer to the output. All map_*( ) functions can take any type of vector as input.
Example 1.
map_chr(mtcars, typeof) # always returns a character vector
## mpg cyl disp hp drat wt qsec vs
## "double" "double" "double" "double" "double" "double" "double" "double"
## am gear carb
## "double" "double" "double"
map_lgl(mtcars, is.double) # always returns a logical vector
## mpg cyl disp hp drat wt qsec vs am gear carb
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
n_unique <- function(x) { length(unique(x)) }
map_int(mtcars, n_unique) # always returns an integer vector
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
map_dbl(mtcars, mean) # always returns a double vector (also known as floats)
## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
Example 2.
1:10 %>%
map(rnorm, n = 10) %>% # output a list
map_dbl(mean) # output an atomic vector
## [1] 0.9614181 1.8371975 2.9805346 4.3129193 4.8374819 6.3585267 7.1639700
## [8] 8.0847131 8.7010062 9.9062764
map_dfr( ) and map_dfc( )
All the purr functions you’ve seen above return lists or vectors, but you may want to retrun a dataframe.
Example 1.
myFunction <- function(arg1){
col <- arg1 * 2
x <- as.data.frame(col)
}
values <- c(1, 3, 5, 7, 9)
df <- map_dfr(values, myFunction) # binds the results row-wise
df <- map_dfc(values, myFunction) # binds the results column-wise
Example 2.
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map_dfr(~ as.data.frame(t(as.matrix(coef(.)))))
## (Intercept) wt
## 1 39.57120 -5.647025
## 2 28.40884 -2.780106
## 3 23.86803 -2.192438