suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("stringr"))
#The package microbenchmark is used for timing code.
suppressPackageStartupMessages(library("microbenchmark"))
package 㤼㸱microbenchmark㤼㸲 was built under R version 3.6.3
1. Read the documentation for apply()
. In the 2nd case, what two for-loops does it generalize.
For an object with two-dimensions, such as a matrix or data frame, apply()
replaces looping over the rows or columns of a matrix or data-frame. The apply()
function is used like apply(X, MARGIN, FUN, ...)
, where X
is a matrix or array, FUN
is a function to apply, and … are additional arguments passed to FUN
.
When MARGIN = 1
, then the function is applied to each row. For example, the following example calculates the row means of a matrix.
X <- matrix(rnorm(15), nrow = 5)
X
[,1] [,2] [,3]
[1,] 1.27367038 0.3025072 -0.55797988
[2,] -0.15047721 0.6893436 1.91526339
[3,] 0.91338311 0.2370009 0.18501749
[4,] -0.09615776 -0.7576654 0.01170414
[5,] 0.64507221 0.9404452 -0.03963321
apply(X, 1, mean)
[1] 0.3393992 0.8180432 0.4451338 -0.2807063 0.5152947
That is equivalent to this for-loop.
X_row_means <- vector("numeric", length = nrow(X))
for (i in seq_len(nrow(X))) {
X_row_means[[i]] <- mean(X[i, ])
}
X_row_means
[1] 0.3393992 0.8180432 0.4451338 -0.2807063 0.5152947
X <- matrix(rnorm(15), nrow = 5)
X
[,1] [,2] [,3]
[1,] -0.1387458 1.5879591 -1.1505933
[2,] -0.9116576 0.7854943 -1.3605260
[3,] -1.1144945 -0.3911062 -1.0232832
[4,] -1.0380229 1.2678207 1.3070090
[5,] -0.2940543 0.1393491 -0.4864869
When MARGIN = 2
, apply()
is equivalent to a for-loop looping over columns.
apply(X, 2, mean)
[1] -0.6993950 0.6779034 -0.5427761
X_col_means <- vector("numeric", length = ncol(X))
for (i in seq_len(ncol(X))) {
X_col_means[[i]] <- mean(X[, i])
}
X_col_means
[1] -0.6993950 0.6779034 -0.5427761
2. Adapt col_summary()
so that it only applies to numeric columns. You might want to start with an is_numeric()
function that returns a logical vector that has a TRUE corresponding to each numeric column.
The original col_summary()
function is
col_summary <- function(df, fun) {
out <- vector("double", length(df))
for (i in seq_along(df)) {
out[i] <- fun(df[[i]])
}
out
}
The adapted version adds extra logic to only apply the function to numeric columns.
col_summary2 <- function(df, fun) {
# create an empty vector which will store whether each
# column is numeric
numeric_cols <- vector("logical", length(df))
# test whether each column is numeric
for (i in seq_along(df)) {
numeric_cols[[i]] <- is.numeric(df[[i]])
}
# find the indexes of the numeric columns
idxs <- which(numeric_cols)
# find the number of numeric columns
n <- sum(numeric_cols)
# create a vector to hold the results
out <- vector("double", n)
# apply the function only to numeric vectors
for (i in seq_along(idxs)) {
out[[i]] <- fun(df[[idxs[[i]]]])
}
# name the vector
names(out) <- names(df)[idxs]
out
}
Let’s test that col_summary2()
works by creating a small data frame with some numeric and non-numeric columns.
df <- tibble(
X1 = c(1, 2, 3),
X2 = c("A", "B", "C"),
X3 = c(0, -1, 5),
X4 = c(TRUE, FALSE, TRUE)
)
col_summary2(df, mean)
X1 X3
2.000000 1.333333
As expected, it only calculates the mean of the numeric columns, X1
and X3
. Let’s test that it works with another function.
col_summary2(df, median)
X1 X3
2 0
LS0tDQp0aXRsZTogIkZvciBsb29wcyB2cy4gZnVuY3Rpb25hbHMiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCmBgYHtyfQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInRpZHl2ZXJzZSIpKQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInN0cmluZ3IiKSkNCiNUaGUgcGFja2FnZSBtaWNyb2JlbmNobWFyayBpcyB1c2VkIGZvciB0aW1pbmcgY29kZS4NCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KCJtaWNyb2JlbmNobWFyayIpKQ0KYGBgDQoNCiMjIyAxLiBSZWFkIHRoZSBkb2N1bWVudGF0aW9uIGZvciBgYXBwbHkoKWAuIEluIHRoZSAybmQgY2FzZSwgd2hhdCB0d28gZm9yLWxvb3BzIGRvZXMgaXQgZ2VuZXJhbGl6ZS4NCg0KRm9yIGFuIG9iamVjdCB3aXRoIHR3by1kaW1lbnNpb25zLCBzdWNoIGFzIGEgbWF0cml4IG9yIGRhdGEgZnJhbWUsIGBhcHBseSgpYCByZXBsYWNlcyBsb29waW5nIG92ZXIgdGhlIHJvd3Mgb3IgY29sdW1ucyBvZiBhIG1hdHJpeCBvciBkYXRhLWZyYW1lLiBUaGUgYGFwcGx5KClgIGZ1bmN0aW9uIGlzIHVzZWQgbGlrZSBgYXBwbHkoWCwgTUFSR0lOLCBGVU4sIC4uLilgLCB3aGVyZSBgWGAgaXMgYSBtYXRyaXggb3IgYXJyYXksIGBGVU5gIGlzIGEgZnVuY3Rpb24gdG8gYXBwbHksIGFuZCAuLi4gYXJlIGFkZGl0aW9uYWwgYXJndW1lbnRzIHBhc3NlZCB0byBgRlVOYC4NCg0KV2hlbiBgTUFSR0lOID0gMWAsIHRoZW4gdGhlIGZ1bmN0aW9uIGlzIGFwcGxpZWQgdG8gZWFjaCByb3cuIEZvciBleGFtcGxlLCB0aGUgZm9sbG93aW5nIGV4YW1wbGUgY2FsY3VsYXRlcyB0aGUgcm93IG1lYW5zIG9mIGEgbWF0cml4Lg0KDQpgYGB7cn0NClggPC0gbWF0cml4KHJub3JtKDE1KSwgbnJvdyA9IDUpDQpYDQphcHBseShYLCAxLCBtZWFuKQ0KYGBgDQoNClRoYXQgaXMgZXF1aXZhbGVudCB0byB0aGlzIGZvci1sb29wLg0KDQpgYGB7cn0NClhfcm93X21lYW5zIDwtIHZlY3RvcigibnVtZXJpYyIsIGxlbmd0aCA9IG5yb3coWCkpDQpmb3IgKGkgaW4gc2VxX2xlbihucm93KFgpKSkgew0KICBYX3Jvd19tZWFuc1tbaV1dIDwtIG1lYW4oWFtpLCBdKQ0KfQ0KWF9yb3dfbWVhbnMNClggPC0gbWF0cml4KHJub3JtKDE1KSwgbnJvdyA9IDUpDQpYDQpgYGANCg0KV2hlbiBgTUFSR0lOID0gMmAsIGBhcHBseSgpYCBpcyBlcXVpdmFsZW50IHRvIGEgZm9yLWxvb3AgbG9vcGluZyBvdmVyIGNvbHVtbnMuDQoNCmBgYHtyfQ0KYXBwbHkoWCwgMiwgbWVhbikNClhfY29sX21lYW5zIDwtIHZlY3RvcigibnVtZXJpYyIsIGxlbmd0aCA9IG5jb2woWCkpDQpmb3IgKGkgaW4gc2VxX2xlbihuY29sKFgpKSkgew0KICBYX2NvbF9tZWFuc1tbaV1dIDwtIG1lYW4oWFssIGldKQ0KfQ0KWF9jb2xfbWVhbnMNCmBgYA0KDQojIyMgMi4gQWRhcHQgYGNvbF9zdW1tYXJ5KClgIHNvIHRoYXQgaXQgb25seSBhcHBsaWVzIHRvIG51bWVyaWMgY29sdW1ucy4gWW91IG1pZ2h0IHdhbnQgdG8gc3RhcnQgd2l0aCBhbiBgaXNfbnVtZXJpYygpYCBmdW5jdGlvbiB0aGF0IHJldHVybnMgYSBsb2dpY2FsIHZlY3RvciB0aGF0IGhhcyBhIFRSVUUgY29ycmVzcG9uZGluZyB0byBlYWNoIG51bWVyaWMgY29sdW1uLg0KDQpUaGUgb3JpZ2luYWwgYGNvbF9zdW1tYXJ5KClgIGZ1bmN0aW9uIGlzDQoNCmBgYHtyfQ0KY29sX3N1bW1hcnkgPC0gZnVuY3Rpb24oZGYsIGZ1bikgew0KICBvdXQgPC0gdmVjdG9yKCJkb3VibGUiLCBsZW5ndGgoZGYpKQ0KICBmb3IgKGkgaW4gc2VxX2Fsb25nKGRmKSkgew0KICAgIG91dFtpXSA8LSBmdW4oZGZbW2ldXSkNCiAgfQ0KICBvdXQNCn0NCmBgYA0KDQpUaGUgYWRhcHRlZCB2ZXJzaW9uIGFkZHMgZXh0cmEgbG9naWMgdG8gb25seSBhcHBseSB0aGUgZnVuY3Rpb24gdG8gbnVtZXJpYyBjb2x1bW5zLg0KDQpgYGB7cn0NCmNvbF9zdW1tYXJ5MiA8LSBmdW5jdGlvbihkZiwgZnVuKSB7DQogICMgY3JlYXRlIGFuIGVtcHR5IHZlY3RvciB3aGljaCB3aWxsIHN0b3JlIHdoZXRoZXIgZWFjaA0KICAjIGNvbHVtbiBpcyBudW1lcmljDQogIG51bWVyaWNfY29scyA8LSB2ZWN0b3IoImxvZ2ljYWwiLCBsZW5ndGgoZGYpKQ0KICAjIHRlc3Qgd2hldGhlciBlYWNoIGNvbHVtbiBpcyBudW1lcmljDQogIGZvciAoaSBpbiBzZXFfYWxvbmcoZGYpKSB7DQogICAgbnVtZXJpY19jb2xzW1tpXV0gPC0gaXMubnVtZXJpYyhkZltbaV1dKQ0KICB9DQogICMgZmluZCB0aGUgaW5kZXhlcyBvZiB0aGUgbnVtZXJpYyBjb2x1bW5zDQogIGlkeHMgPC0gd2hpY2gobnVtZXJpY19jb2xzKQ0KICAjIGZpbmQgdGhlIG51bWJlciBvZiBudW1lcmljIGNvbHVtbnMNCiAgbiA8LSBzdW0obnVtZXJpY19jb2xzKQ0KICAjIGNyZWF0ZSBhIHZlY3RvciB0byBob2xkIHRoZSByZXN1bHRzDQogIG91dCA8LSB2ZWN0b3IoImRvdWJsZSIsIG4pDQogICMgYXBwbHkgdGhlIGZ1bmN0aW9uIG9ubHkgdG8gbnVtZXJpYyB2ZWN0b3JzDQogIGZvciAoaSBpbiBzZXFfYWxvbmcoaWR4cykpIHsNCiAgICBvdXRbW2ldXSA8LSBmdW4oZGZbW2lkeHNbW2ldXV1dKQ0KICB9DQogICMgbmFtZSB0aGUgdmVjdG9yDQogIG5hbWVzKG91dCkgPC0gbmFtZXMoZGYpW2lkeHNdDQogIG91dA0KfQ0KYGBgDQoNCkxldOKAmXMgdGVzdCB0aGF0IGBjb2xfc3VtbWFyeTIoKWAgd29ya3MgYnkgY3JlYXRpbmcgYSBzbWFsbCBkYXRhIGZyYW1lIHdpdGggc29tZSBudW1lcmljIGFuZCBub24tbnVtZXJpYyBjb2x1bW5zLg0KDQpgYGB7cn0NCmRmIDwtIHRpYmJsZSgNCiAgWDEgPSBjKDEsIDIsIDMpLA0KICBYMiA9IGMoIkEiLCAiQiIsICJDIiksDQogIFgzID0gYygwLCAtMSwgNSksDQogIFg0ID0gYyhUUlVFLCBGQUxTRSwgVFJVRSkNCikNCmNvbF9zdW1tYXJ5MihkZiwgbWVhbikNCmBgYA0KDQpBcyBleHBlY3RlZCwgaXQgb25seSBjYWxjdWxhdGVzIHRoZSBtZWFuIG9mIHRoZSBudW1lcmljIGNvbHVtbnMsIGBYMWAgYW5kIGBYM2AuIExldOKAmXMgdGVzdCB0aGF0IGl0IHdvcmtzIHdpdGggYW5vdGhlciBmdW5jdGlvbi4NCg0KYGBge3J9DQpjb2xfc3VtbWFyeTIoZGYsIG1lZGlhbikNCmBgYA==