suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("stringr"))
#The package microbenchmark is used for timing code.
suppressPackageStartupMessages(library("microbenchmark"))
package 㤼㸱microbenchmark㤼㸲 was built under R version 3.6.3

1. Read the documentation for apply(). In the 2nd case, what two for-loops does it generalize.

For an object with two-dimensions, such as a matrix or data frame, apply() replaces looping over the rows or columns of a matrix or data-frame. The apply() function is used like apply(X, MARGIN, FUN, ...), where X is a matrix or array, FUN is a function to apply, and … are additional arguments passed to FUN.

When MARGIN = 1, then the function is applied to each row. For example, the following example calculates the row means of a matrix.

X <- matrix(rnorm(15), nrow = 5)
X
            [,1]       [,2]        [,3]
[1,]  1.27367038  0.3025072 -0.55797988
[2,] -0.15047721  0.6893436  1.91526339
[3,]  0.91338311  0.2370009  0.18501749
[4,] -0.09615776 -0.7576654  0.01170414
[5,]  0.64507221  0.9404452 -0.03963321
apply(X, 1, mean)
[1]  0.3393992  0.8180432  0.4451338 -0.2807063  0.5152947

That is equivalent to this for-loop.

X_row_means <- vector("numeric", length = nrow(X))
for (i in seq_len(nrow(X))) {
  X_row_means[[i]] <- mean(X[i, ])
}
X_row_means
[1]  0.3393992  0.8180432  0.4451338 -0.2807063  0.5152947
X <- matrix(rnorm(15), nrow = 5)
X
           [,1]       [,2]       [,3]
[1,] -0.1387458  1.5879591 -1.1505933
[2,] -0.9116576  0.7854943 -1.3605260
[3,] -1.1144945 -0.3911062 -1.0232832
[4,] -1.0380229  1.2678207  1.3070090
[5,] -0.2940543  0.1393491 -0.4864869

When MARGIN = 2, apply() is equivalent to a for-loop looping over columns.

apply(X, 2, mean)
[1] -0.6993950  0.6779034 -0.5427761
X_col_means <- vector("numeric", length = ncol(X))
for (i in seq_len(ncol(X))) {
  X_col_means[[i]] <- mean(X[, i])
}
X_col_means
[1] -0.6993950  0.6779034 -0.5427761

2. Adapt col_summary() so that it only applies to numeric columns. You might want to start with an is_numeric() function that returns a logical vector that has a TRUE corresponding to each numeric column.

The original col_summary() function is

col_summary <- function(df, fun) {
  out <- vector("double", length(df))
  for (i in seq_along(df)) {
    out[i] <- fun(df[[i]])
  }
  out
}

The adapted version adds extra logic to only apply the function to numeric columns.

col_summary2 <- function(df, fun) {
  # create an empty vector which will store whether each
  # column is numeric
  numeric_cols <- vector("logical", length(df))
  # test whether each column is numeric
  for (i in seq_along(df)) {
    numeric_cols[[i]] <- is.numeric(df[[i]])
  }
  # find the indexes of the numeric columns
  idxs <- which(numeric_cols)
  # find the number of numeric columns
  n <- sum(numeric_cols)
  # create a vector to hold the results
  out <- vector("double", n)
  # apply the function only to numeric vectors
  for (i in seq_along(idxs)) {
    out[[i]] <- fun(df[[idxs[[i]]]])
  }
  # name the vector
  names(out) <- names(df)[idxs]
  out
}

Let’s test that col_summary2() works by creating a small data frame with some numeric and non-numeric columns.

df <- tibble(
  X1 = c(1, 2, 3),
  X2 = c("A", "B", "C"),
  X3 = c(0, -1, 5),
  X4 = c(TRUE, FALSE, TRUE)
)
col_summary2(df, mean)
      X1       X3 
2.000000 1.333333 

As expected, it only calculates the mean of the numeric columns, X1 and X3. Let’s test that it works with another function.

col_summary2(df, median)
X1 X3 
 2  0 
LS0tDQp0aXRsZTogIkZvciBsb29wcyB2cy4gZnVuY3Rpb25hbHMiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCmBgYHtyfQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInRpZHl2ZXJzZSIpKQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInN0cmluZ3IiKSkNCiNUaGUgcGFja2FnZSBtaWNyb2JlbmNobWFyayBpcyB1c2VkIGZvciB0aW1pbmcgY29kZS4NCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KCJtaWNyb2JlbmNobWFyayIpKQ0KYGBgDQoNCiMjIyAxLiBSZWFkIHRoZSBkb2N1bWVudGF0aW9uIGZvciBgYXBwbHkoKWAuIEluIHRoZSAybmQgY2FzZSwgd2hhdCB0d28gZm9yLWxvb3BzIGRvZXMgaXQgZ2VuZXJhbGl6ZS4NCg0KRm9yIGFuIG9iamVjdCB3aXRoIHR3by1kaW1lbnNpb25zLCBzdWNoIGFzIGEgbWF0cml4IG9yIGRhdGEgZnJhbWUsIGBhcHBseSgpYCByZXBsYWNlcyBsb29waW5nIG92ZXIgdGhlIHJvd3Mgb3IgY29sdW1ucyBvZiBhIG1hdHJpeCBvciBkYXRhLWZyYW1lLiBUaGUgYGFwcGx5KClgIGZ1bmN0aW9uIGlzIHVzZWQgbGlrZSBgYXBwbHkoWCwgTUFSR0lOLCBGVU4sIC4uLilgLCB3aGVyZSBgWGAgaXMgYSBtYXRyaXggb3IgYXJyYXksIGBGVU5gIGlzIGEgZnVuY3Rpb24gdG8gYXBwbHksIGFuZCAuLi4gYXJlIGFkZGl0aW9uYWwgYXJndW1lbnRzIHBhc3NlZCB0byBgRlVOYC4NCg0KV2hlbiBgTUFSR0lOID0gMWAsIHRoZW4gdGhlIGZ1bmN0aW9uIGlzIGFwcGxpZWQgdG8gZWFjaCByb3cuIEZvciBleGFtcGxlLCB0aGUgZm9sbG93aW5nIGV4YW1wbGUgY2FsY3VsYXRlcyB0aGUgcm93IG1lYW5zIG9mIGEgbWF0cml4Lg0KDQpgYGB7cn0NClggPC0gbWF0cml4KHJub3JtKDE1KSwgbnJvdyA9IDUpDQpYDQphcHBseShYLCAxLCBtZWFuKQ0KYGBgDQoNClRoYXQgaXMgZXF1aXZhbGVudCB0byB0aGlzIGZvci1sb29wLg0KDQpgYGB7cn0NClhfcm93X21lYW5zIDwtIHZlY3RvcigibnVtZXJpYyIsIGxlbmd0aCA9IG5yb3coWCkpDQpmb3IgKGkgaW4gc2VxX2xlbihucm93KFgpKSkgew0KICBYX3Jvd19tZWFuc1tbaV1dIDwtIG1lYW4oWFtpLCBdKQ0KfQ0KWF9yb3dfbWVhbnMNClggPC0gbWF0cml4KHJub3JtKDE1KSwgbnJvdyA9IDUpDQpYDQpgYGANCg0KV2hlbiBgTUFSR0lOID0gMmAsIGBhcHBseSgpYCBpcyBlcXVpdmFsZW50IHRvIGEgZm9yLWxvb3AgbG9vcGluZyBvdmVyIGNvbHVtbnMuDQoNCmBgYHtyfQ0KYXBwbHkoWCwgMiwgbWVhbikNClhfY29sX21lYW5zIDwtIHZlY3RvcigibnVtZXJpYyIsIGxlbmd0aCA9IG5jb2woWCkpDQpmb3IgKGkgaW4gc2VxX2xlbihuY29sKFgpKSkgew0KICBYX2NvbF9tZWFuc1tbaV1dIDwtIG1lYW4oWFssIGldKQ0KfQ0KWF9jb2xfbWVhbnMNCmBgYA0KDQojIyMgMi4gQWRhcHQgYGNvbF9zdW1tYXJ5KClgIHNvIHRoYXQgaXQgb25seSBhcHBsaWVzIHRvIG51bWVyaWMgY29sdW1ucy4gWW91IG1pZ2h0IHdhbnQgdG8gc3RhcnQgd2l0aCBhbiBgaXNfbnVtZXJpYygpYCBmdW5jdGlvbiB0aGF0IHJldHVybnMgYSBsb2dpY2FsIHZlY3RvciB0aGF0IGhhcyBhIFRSVUUgY29ycmVzcG9uZGluZyB0byBlYWNoIG51bWVyaWMgY29sdW1uLg0KDQpUaGUgb3JpZ2luYWwgYGNvbF9zdW1tYXJ5KClgIGZ1bmN0aW9uIGlzDQoNCmBgYHtyfQ0KY29sX3N1bW1hcnkgPC0gZnVuY3Rpb24oZGYsIGZ1bikgew0KICBvdXQgPC0gdmVjdG9yKCJkb3VibGUiLCBsZW5ndGgoZGYpKQ0KICBmb3IgKGkgaW4gc2VxX2Fsb25nKGRmKSkgew0KICAgIG91dFtpXSA8LSBmdW4oZGZbW2ldXSkNCiAgfQ0KICBvdXQNCn0NCmBgYA0KDQpUaGUgYWRhcHRlZCB2ZXJzaW9uIGFkZHMgZXh0cmEgbG9naWMgdG8gb25seSBhcHBseSB0aGUgZnVuY3Rpb24gdG8gbnVtZXJpYyBjb2x1bW5zLg0KDQpgYGB7cn0NCmNvbF9zdW1tYXJ5MiA8LSBmdW5jdGlvbihkZiwgZnVuKSB7DQogICMgY3JlYXRlIGFuIGVtcHR5IHZlY3RvciB3aGljaCB3aWxsIHN0b3JlIHdoZXRoZXIgZWFjaA0KICAjIGNvbHVtbiBpcyBudW1lcmljDQogIG51bWVyaWNfY29scyA8LSB2ZWN0b3IoImxvZ2ljYWwiLCBsZW5ndGgoZGYpKQ0KICAjIHRlc3Qgd2hldGhlciBlYWNoIGNvbHVtbiBpcyBudW1lcmljDQogIGZvciAoaSBpbiBzZXFfYWxvbmcoZGYpKSB7DQogICAgbnVtZXJpY19jb2xzW1tpXV0gPC0gaXMubnVtZXJpYyhkZltbaV1dKQ0KICB9DQogICMgZmluZCB0aGUgaW5kZXhlcyBvZiB0aGUgbnVtZXJpYyBjb2x1bW5zDQogIGlkeHMgPC0gd2hpY2gobnVtZXJpY19jb2xzKQ0KICAjIGZpbmQgdGhlIG51bWJlciBvZiBudW1lcmljIGNvbHVtbnMNCiAgbiA8LSBzdW0obnVtZXJpY19jb2xzKQ0KICAjIGNyZWF0ZSBhIHZlY3RvciB0byBob2xkIHRoZSByZXN1bHRzDQogIG91dCA8LSB2ZWN0b3IoImRvdWJsZSIsIG4pDQogICMgYXBwbHkgdGhlIGZ1bmN0aW9uIG9ubHkgdG8gbnVtZXJpYyB2ZWN0b3JzDQogIGZvciAoaSBpbiBzZXFfYWxvbmcoaWR4cykpIHsNCiAgICBvdXRbW2ldXSA8LSBmdW4oZGZbW2lkeHNbW2ldXV1dKQ0KICB9DQogICMgbmFtZSB0aGUgdmVjdG9yDQogIG5hbWVzKG91dCkgPC0gbmFtZXMoZGYpW2lkeHNdDQogIG91dA0KfQ0KYGBgDQoNCkxldOKAmXMgdGVzdCB0aGF0IGBjb2xfc3VtbWFyeTIoKWAgd29ya3MgYnkgY3JlYXRpbmcgYSBzbWFsbCBkYXRhIGZyYW1lIHdpdGggc29tZSBudW1lcmljIGFuZCBub24tbnVtZXJpYyBjb2x1bW5zLg0KDQpgYGB7cn0NCmRmIDwtIHRpYmJsZSgNCiAgWDEgPSBjKDEsIDIsIDMpLA0KICBYMiA9IGMoIkEiLCAiQiIsICJDIiksDQogIFgzID0gYygwLCAtMSwgNSksDQogIFg0ID0gYyhUUlVFLCBGQUxTRSwgVFJVRSkNCikNCmNvbF9zdW1tYXJ5MihkZiwgbWVhbikNCmBgYA0KDQpBcyBleHBlY3RlZCwgaXQgb25seSBjYWxjdWxhdGVzIHRoZSBtZWFuIG9mIHRoZSBudW1lcmljIGNvbHVtbnMsIGBYMWAgYW5kIGBYM2AuIExldOKAmXMgdGVzdCB0aGF0IGl0IHdvcmtzIHdpdGggYW5vdGhlciBmdW5jdGlvbi4NCg0KYGBge3J9DQpjb2xfc3VtbWFyeTIoZGYsIG1lZGlhbikNCmBgYA==