suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
1. Implement your own version of every()
using a for loop. Compare it with purrr::every()
. What does purrr’s version do that your version doesn’t?
# Use ... to pass arguments to the function
every2 <- function(.x, .p, ...) {
for (i in .x) {
if (!.p(i, ...)) {
# If any is FALSE we know not all of then were TRUE
return(FALSE)
}
}
# if nothing was FALSE, then it is TRUE
TRUE
}
every2(1:3, function(x) {
x > 1
})
[1] FALSE
every2(1:3, function(x) {
x > 0
})
[1] TRUE
The function purrr::every()
does fancy things with the predicate function argument .p
, like taking a logical vector instead of a function, or being able to test part of a string if the elements of .x
are lists.
2. Create an enhanced col_summary()
that applies a summary function to every numeric column in a data frame.
I will use map to apply the function to all the columns, and keep to only select numeric columns.
col_sum2 <- function(df, f, ...) {
map(keep(df, is.numeric), f, ...)
}
col_sum2(iris, mean)
$Sepal.Length
[1] 5.843333
$Sepal.Width
[1] 3.057333
$Petal.Length
[1] 3.758
$Petal.Width
[1] 1.199333
3. A possible base R equivalent of col_summary()
is:
col_sum3 <- function(df, f) {
is_num <- sapply(df, is.numeric)
df_num <- df[, is_num]
sapply(df_num, f)
}
But it has a number of bugs as illustrated with the following inputs:
df <- tibble(
x = 1:3,
y = 3:1,
z = c("a", "b", "c")
)
# OK
col_sum3(df, mean)
x y
2 2
# Has problems: don't always return numeric vector
col_sum3(df[1:2], mean)
x y
2 2
col_sum3(df[1], mean)
x
2
# col_sum3(df[0], mean)
#> Error: Can't subset with `[` using an object of class list.
What causes these bugs?
The cause of these bugs is the behavior of sapply()
. The sapply()
function does not guarantee the type of vector it returns, and will returns different types of vectors depending on its inputs. If no columns are selected, instead of returning an empty numeric vector, it returns an empty list. This causes an error since we can’t use a list with [
.
sapply(df[0], is.numeric)
named list()
sapply(df[1], is.numeric)
x
TRUE
sapply(df[1:2], is.numeric)
x y
TRUE TRUE
The sapply()
function tries to be helpful by simplifying the results, but this behavior can be counterproductive. It is okay to use the sapply()
function interactively, but avoid programming with it.
LS0tDQp0aXRsZTogIk90aGVyIHBhdHRlcm5zIG9mIGZvciBsb29wcyINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3J9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpgYGANCg0KIyMjIDEuIEltcGxlbWVudCB5b3VyIG93biB2ZXJzaW9uIG9mIGBldmVyeSgpYCB1c2luZyBhIGZvciBsb29wLiBDb21wYXJlIGl0IHdpdGggYHB1cnJyOjpldmVyeSgpYC4gV2hhdCBkb2VzIHB1cnJy4oCZcyB2ZXJzaW9uIGRvIHRoYXQgeW91ciB2ZXJzaW9uIGRvZXNu4oCZdD8NCg0KYGBge3J9DQojIFVzZSAuLi4gdG8gcGFzcyBhcmd1bWVudHMgdG8gdGhlIGZ1bmN0aW9uDQpldmVyeTIgPC0gZnVuY3Rpb24oLngsIC5wLCAuLi4pIHsNCiAgZm9yIChpIGluIC54KSB7DQogICAgaWYgKCEucChpLCAuLi4pKSB7DQogICAgICAjIElmIGFueSBpcyBGQUxTRSB3ZSBrbm93IG5vdCBhbGwgb2YgdGhlbiB3ZXJlIFRSVUUNCiAgICAgIHJldHVybihGQUxTRSkNCiAgICB9DQogIH0NCiAgIyBpZiBub3RoaW5nIHdhcyBGQUxTRSwgdGhlbiBpdCBpcyBUUlVFDQogIFRSVUUNCn0NCg0KZXZlcnkyKDE6MywgZnVuY3Rpb24oeCkgew0KICB4ID4gMQ0KfSkNCmV2ZXJ5MigxOjMsIGZ1bmN0aW9uKHgpIHsNCiAgeCA+IDANCn0pDQpgYGANCg0KVGhlIGZ1bmN0aW9uIGBwdXJycjo6ZXZlcnkoKWAgZG9lcyBmYW5jeSB0aGluZ3Mgd2l0aCB0aGUgcHJlZGljYXRlIGZ1bmN0aW9uIGFyZ3VtZW50IGAucGAsIGxpa2UgdGFraW5nIGEgbG9naWNhbCB2ZWN0b3IgaW5zdGVhZCBvZiBhIGZ1bmN0aW9uLCBvciBiZWluZyBhYmxlIHRvIHRlc3QgcGFydCBvZiBhIHN0cmluZyBpZiB0aGUgZWxlbWVudHMgb2YgYC54YCBhcmUgbGlzdHMuDQoNCiMjIyAyLiBDcmVhdGUgYW4gZW5oYW5jZWQgYGNvbF9zdW1tYXJ5KClgIHRoYXQgYXBwbGllcyBhIHN1bW1hcnkgZnVuY3Rpb24gdG8gZXZlcnkgbnVtZXJpYyBjb2x1bW4gaW4gYSBkYXRhIGZyYW1lLg0KDQpJIHdpbGwgdXNlIG1hcCB0byBhcHBseSB0aGUgZnVuY3Rpb24gdG8gYWxsIHRoZSBjb2x1bW5zLCBhbmQga2VlcCB0byBvbmx5IHNlbGVjdCBudW1lcmljIGNvbHVtbnMuDQoNCmBgYHtyfQ0KY29sX3N1bTIgPC0gZnVuY3Rpb24oZGYsIGYsIC4uLikgew0KICBtYXAoa2VlcChkZiwgaXMubnVtZXJpYyksIGYsIC4uLikNCn0NCmNvbF9zdW0yKGlyaXMsIG1lYW4pDQpgYGANCg0KIyMjIDMuIEEgcG9zc2libGUgYmFzZSBSIGVxdWl2YWxlbnQgb2YgYGNvbF9zdW1tYXJ5KClgIGlzOg0KDQpgYGB7cn0NCmNvbF9zdW0zIDwtIGZ1bmN0aW9uKGRmLCBmKSB7DQogIGlzX251bSA8LSBzYXBwbHkoZGYsIGlzLm51bWVyaWMpDQogIGRmX251bSA8LSBkZlssIGlzX251bV0NCiAgc2FwcGx5KGRmX251bSwgZikNCn0NCmBgYA0KDQpCdXQgaXQgaGFzIGEgbnVtYmVyIG9mIGJ1Z3MgYXMgaWxsdXN0cmF0ZWQgd2l0aCB0aGUgZm9sbG93aW5nIGlucHV0czoNCg0KYGBge3J9DQpkZiA8LSB0aWJibGUoDQogIHggPSAxOjMsDQogIHkgPSAzOjEsDQogIHogPSBjKCJhIiwgImIiLCAiYyIpDQopDQoNCiMgT0sNCmNvbF9zdW0zKGRmLCBtZWFuKQ0KIyBIYXMgcHJvYmxlbXM6IGRvbid0IGFsd2F5cyByZXR1cm4gbnVtZXJpYyB2ZWN0b3INCmNvbF9zdW0zKGRmWzE6Ml0sIG1lYW4pDQpjb2xfc3VtMyhkZlsxXSwgbWVhbikNCiMgY29sX3N1bTMoZGZbMF0sIG1lYW4pDQojPiBFcnJvcjogQ2FuJ3Qgc3Vic2V0IHdpdGggYFtgIHVzaW5nIGFuIG9iamVjdCBvZiBjbGFzcyBsaXN0Lg0KYGBgDQoNCldoYXQgY2F1c2VzIHRoZXNlIGJ1Z3M/DQoNClRoZSBjYXVzZSBvZiB0aGVzZSBidWdzIGlzIHRoZSBiZWhhdmlvciBvZiBgc2FwcGx5KClgLiBUaGUgYHNhcHBseSgpYCBmdW5jdGlvbiBkb2VzIG5vdCBndWFyYW50ZWUgdGhlIHR5cGUgb2YgdmVjdG9yIGl0IHJldHVybnMsIGFuZCB3aWxsIHJldHVybnMgZGlmZmVyZW50IHR5cGVzIG9mIHZlY3RvcnMgZGVwZW5kaW5nIG9uIGl0cyBpbnB1dHMuIElmIG5vIGNvbHVtbnMgYXJlIHNlbGVjdGVkLCBpbnN0ZWFkIG9mIHJldHVybmluZyBhbiBlbXB0eSBudW1lcmljIHZlY3RvciwgaXQgcmV0dXJucyBhbiBlbXB0eSBsaXN0LiBUaGlzIGNhdXNlcyBhbiBlcnJvciBzaW5jZSB3ZSBjYW7igJl0IHVzZSBhIGxpc3Qgd2l0aCBgW2AuDQoNCmBgYHtyfQ0Kc2FwcGx5KGRmWzBdLCBpcy5udW1lcmljKQ0Kc2FwcGx5KGRmWzFdLCBpcy5udW1lcmljKQ0Kc2FwcGx5KGRmWzE6Ml0sIGlzLm51bWVyaWMpDQpgYGANCg0KVGhlIGBzYXBwbHkoKWAgZnVuY3Rpb24gdHJpZXMgdG8gYmUgaGVscGZ1bCBieSBzaW1wbGlmeWluZyB0aGUgcmVzdWx0cywgYnV0IHRoaXMgYmVoYXZpb3IgY2FuIGJlIGNvdW50ZXJwcm9kdWN0aXZlLiBJdCBpcyBva2F5IHRvIHVzZSB0aGUgYHNhcHBseSgpYCBmdW5jdGlvbiBpbnRlcmFjdGl2ZWx5LCBidXQgYXZvaWQgcHJvZ3JhbW1pbmcgd2l0aCBpdC4=