suppressPackageStartupMessages(library(nycflights13))
package 㤼㸱nycflights13㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library(tidyverse))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3

1. Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.

These are a few ways to select columns.

  • Specify columns names as unquoted variable names.
select(flights, dep_time, dep_delay, arr_time, arr_delay)
  • Specify column names as strings.
select(flights, "dep_time", "dep_delay", "arr_time", "arr_delay")
  • Specify the column numbers of the variables.
select(flights, 4, 6, 7, 9)

This works, but is not good practice for two reasons. First, the column location of variables may change, resulting in code that may continue to run without error, but produce the wrong answer. Second code is obfuscated, since it is not clear from the code which variables are being selected. What variable does column 6 correspond to? I just wrote the code, and I’ve already forgotten.

  • Specify the names of the variables with character vector and one_of().
select(flights, one_of(c("dep_time", "dep_delay", "arr_time", "arr_delay")))

This is useful because the names of the variables can be stored in a variable and passed to one_of().

variables <- c("dep_time", "dep_delay", "arr_time", "arr_delay")
select(flights, one_of(variables))
  • Selecting the variables by matching the start of their names using starts_with().
select(flights, starts_with("dep_"), starts_with("arr_"))

Selecting the variables using regular expressions with matches(). Regular expressions provide a flexible way to match string patterns and are discussed later.

select(flights, matches("^(dep|arr)_(time|delay)$"))

Some things that don’t work are

  • Matching the ends of their names using ends_with() since this will incorrectly include other variables. For example,
select(flights, ends_with("arr_time"), ends_with("dep_time"))
  • Matching the names using contains() since there is not a pattern that can include all these variables without incorrectly including others.
select(flights, contains("_time"), contains("arr_"))

2.What happens if you include the name of a variable multiple times in a select() call?

The select() call ignores the duplication. Any duplicated variables are only included once, in the first location they appear. The select() function does not raise an error or warning or print any message if there are duplicated variables.

select(flights, year, month, day, year, year)

This behavior is useful because it means that we can use select() with everything() in order to easily change the order of columns without having to specify the names of all the columns.

select(flights, arr_delay, everything())

3. What does the one_of() function do? Why might it be helpful in conjunction with this vector?

The one_of() function selects variables with a character vector rather than unquoted variable name arguments. This function is useful because it is easier to programmatically generate character vectors with variable names than to generate unquoted variable names, which are easier to type.

vars <- c("year", "month", "day", "dep_delay", "arr_delay")
select(flights, one_of(vars))

In the most recent versions of dplyr you can provide the name of a vector containing the variable names you wish to select.

select(flights, vars)

However there is a problem with this. It is not clear whether vars refers to a column name or a variable. The way the code is evaluated will depend on whether vars is column in flights. If vars was a column in flights, then that code would select only the vars column. If vars is not a column in flights, as is the case, then select will replace it with its value, and select those columns. If it has the same name or to ensure that it will not conflict with the names of the columns in the data frame, use the !!! (bang-bang-bang) operator.

select(flights, !!!vars)

This behavior, which is used by many tidyverse functions, is an example of what is called non-standard evaluation (NSE) in R. See the dplyr vignette, Programming with dplyr, for more information on this topic.

4.Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?

select(flights, contains("TIME"))

The default behavior for contains() is to ignore case. This may or may not surprise you. If this behavior does not surprise you, that could be why it is the default. Users searching for variable names probably have a better sense of the letters in the variable than their capitalization. A second, technical, reason is that dplyr works with more than R data frames. It can also work with a variety of databases. Some of these database engines have case insensitive column names, so making functions that match variable names case insensitive by default will make the behavior of select() consistent regardless of whether the table is stored as an R data frame or in a database.

To change the behavior add the argument ignore.case = FALSE.

select(flights, contains("TIME", ignore.case = FALSE))
LS0tDQp0aXRsZTogIlNlbGVjdCBjb2x1bW5zIHdpdGggc2VsZWN0KCkiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCmBgYHtyIGxvYWRsaWJyYXJ5fQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkobnljZmxpZ2h0czEzKSkNCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KHRpZHl2ZXJzZSkpDQpgYGANCg0KIyMjIDEuIEJyYWluc3Rvcm0gYXMgbWFueSB3YXlzIGFzIHBvc3NpYmxlIHRvIHNlbGVjdCBgZGVwX3RpbWVgLCBgZGVwX2RlbGF5YCwgYGFycl90aW1lYCwgYW5kIGBhcnJfZGVsYXlgIGZyb20gZmxpZ2h0cy4NCg0KVGhlc2UgYXJlIGEgZmV3IHdheXMgdG8gc2VsZWN0IGNvbHVtbnMuDQoNCiAtIFNwZWNpZnkgY29sdW1ucyBuYW1lcyBhcyB1bnF1b3RlZCB2YXJpYWJsZSBuYW1lcy4NCiANCmBgYHtyfQ0Kc2VsZWN0KGZsaWdodHMsIGRlcF90aW1lLCBkZXBfZGVsYXksIGFycl90aW1lLCBhcnJfZGVsYXkpDQpgYGANCg0KIC0gU3BlY2lmeSBjb2x1bW4gbmFtZXMgYXMgc3RyaW5ncy4NCg0KYGBge3J9DQpzZWxlY3QoZmxpZ2h0cywgImRlcF90aW1lIiwgImRlcF9kZWxheSIsICJhcnJfdGltZSIsICJhcnJfZGVsYXkiKQ0KYGBgDQoNCiAtIFNwZWNpZnkgdGhlIGNvbHVtbiBudW1iZXJzIG9mIHRoZSB2YXJpYWJsZXMuDQoNCmBgYHtyfQ0Kc2VsZWN0KGZsaWdodHMsIDQsIDYsIDcsIDkpDQpgYGANCg0KVGhpcyB3b3JrcywgYnV0IGlzIG5vdCBnb29kIHByYWN0aWNlIGZvciB0d28gcmVhc29ucy4gRmlyc3QsIHRoZSBjb2x1bW4gbG9jYXRpb24gb2YgdmFyaWFibGVzIG1heSBjaGFuZ2UsIHJlc3VsdGluZyBpbiBjb2RlIHRoYXQgbWF5IGNvbnRpbnVlIHRvIHJ1biB3aXRob3V0IGVycm9yLCBidXQgcHJvZHVjZSB0aGUgd3JvbmcgYW5zd2VyLiBTZWNvbmQgY29kZSBpcyBvYmZ1c2NhdGVkLCBzaW5jZSBpdCBpcyBub3QgY2xlYXIgZnJvbSB0aGUgY29kZSB3aGljaCB2YXJpYWJsZXMgYXJlIGJlaW5nIHNlbGVjdGVkLiBXaGF0IHZhcmlhYmxlIGRvZXMgY29sdW1uIDYgY29ycmVzcG9uZCB0bz8gSSBqdXN0IHdyb3RlIHRoZSBjb2RlLCBhbmQgSeKAmXZlIGFscmVhZHkgZm9yZ290dGVuLg0KDQogLSBTcGVjaWZ5IHRoZSBuYW1lcyBvZiB0aGUgdmFyaWFibGVzIHdpdGggY2hhcmFjdGVyIHZlY3RvciBhbmQgYG9uZV9vZigpYC4NCg0KYGBge3J9DQpzZWxlY3QoZmxpZ2h0cywgb25lX29mKGMoImRlcF90aW1lIiwgImRlcF9kZWxheSIsICJhcnJfdGltZSIsICJhcnJfZGVsYXkiKSkpDQpgYGANCg0KVGhpcyBpcyB1c2VmdWwgYmVjYXVzZSB0aGUgbmFtZXMgb2YgdGhlIHZhcmlhYmxlcyBjYW4gYmUgc3RvcmVkIGluIGEgdmFyaWFibGUgYW5kIHBhc3NlZCB0byBgb25lX29mKClgLg0KDQpgYGB7cn0NCnZhcmlhYmxlcyA8LSBjKCJkZXBfdGltZSIsICJkZXBfZGVsYXkiLCAiYXJyX3RpbWUiLCAiYXJyX2RlbGF5IikNCnNlbGVjdChmbGlnaHRzLCBvbmVfb2YodmFyaWFibGVzKSkNCmBgYA0KDQogLSBTZWxlY3RpbmcgdGhlIHZhcmlhYmxlcyBieSBtYXRjaGluZyB0aGUgc3RhcnQgb2YgdGhlaXIgbmFtZXMgdXNpbmcgYHN0YXJ0c193aXRoKClgLg0KDQpgYGB7cn0NCnNlbGVjdChmbGlnaHRzLCBzdGFydHNfd2l0aCgiZGVwXyIpLCBzdGFydHNfd2l0aCgiYXJyXyIpKQ0KYGBgDQoNClNlbGVjdGluZyB0aGUgdmFyaWFibGVzIHVzaW5nIHJlZ3VsYXIgZXhwcmVzc2lvbnMgd2l0aCBgbWF0Y2hlcygpYC4gUmVndWxhciBleHByZXNzaW9ucyBwcm92aWRlIGEgZmxleGlibGUgd2F5IHRvIG1hdGNoIHN0cmluZyBwYXR0ZXJucyBhbmQgYXJlIGRpc2N1c3NlZCBsYXRlci4NCg0KYGBge3J9DQpzZWxlY3QoZmxpZ2h0cywgbWF0Y2hlcygiXihkZXB8YXJyKV8odGltZXxkZWxheSkkIikpDQpgYGANCg0KU29tZSB0aGluZ3MgdGhhdCAqKmRvbuKAmXQqKiB3b3JrIGFyZQ0KDQogLSBNYXRjaGluZyB0aGUgZW5kcyBvZiB0aGVpciBuYW1lcyB1c2luZyBlbmRzX3dpdGgoKSBzaW5jZSB0aGlzIHdpbGwgaW5jb3JyZWN0bHkgaW5jbHVkZSBvdGhlciB2YXJpYWJsZXMuIEZvciBleGFtcGxlLA0KDQpgYGB7cn0NCnNlbGVjdChmbGlnaHRzLCBlbmRzX3dpdGgoImFycl90aW1lIiksIGVuZHNfd2l0aCgiZGVwX3RpbWUiKSkNCmBgYA0KDQogLSBNYXRjaGluZyB0aGUgbmFtZXMgdXNpbmcgY29udGFpbnMoKSBzaW5jZSB0aGVyZSBpcyBub3QgYSBwYXR0ZXJuIHRoYXQgY2FuIGluY2x1ZGUgYWxsIHRoZXNlIHZhcmlhYmxlcyB3aXRob3V0IGluY29ycmVjdGx5IGluY2x1ZGluZyBvdGhlcnMuDQoNCmBgYHtyfQ0Kc2VsZWN0KGZsaWdodHMsIGNvbnRhaW5zKCJfdGltZSIpLCBjb250YWlucygiYXJyXyIpKQ0KYGBgDQoNCiMjIyAyLldoYXQgaGFwcGVucyBpZiB5b3UgaW5jbHVkZSB0aGUgbmFtZSBvZiBhIHZhcmlhYmxlIG11bHRpcGxlIHRpbWVzIGluIGEgYHNlbGVjdCgpYCBjYWxsPw0KDQpUaGUgYHNlbGVjdCgpYCBjYWxsIGlnbm9yZXMgdGhlIGR1cGxpY2F0aW9uLiBBbnkgZHVwbGljYXRlZCB2YXJpYWJsZXMgYXJlIG9ubHkgaW5jbHVkZWQgb25jZSwgaW4gdGhlIGZpcnN0IGxvY2F0aW9uIHRoZXkgYXBwZWFyLiBUaGUgYHNlbGVjdCgpYCBmdW5jdGlvbiBkb2VzIG5vdCByYWlzZSBhbiBlcnJvciBvciB3YXJuaW5nIG9yIHByaW50IGFueSBtZXNzYWdlIGlmIHRoZXJlIGFyZSBkdXBsaWNhdGVkIHZhcmlhYmxlcy4NCg0KYGBge3IgZHVwfQ0Kc2VsZWN0KGZsaWdodHMsIHllYXIsIG1vbnRoLCBkYXksIHllYXIsIHllYXIpDQpgYGANCg0KVGhpcyBiZWhhdmlvciBpcyB1c2VmdWwgYmVjYXVzZSBpdCBtZWFucyB0aGF0IHdlIGNhbiB1c2UgYHNlbGVjdCgpYCB3aXRoIGBldmVyeXRoaW5nKClgIGluIG9yZGVyIHRvIGVhc2lseSBjaGFuZ2UgdGhlIG9yZGVyIG9mIGNvbHVtbnMgd2l0aG91dCBoYXZpbmcgdG8gc3BlY2lmeSB0aGUgbmFtZXMgb2YgYWxsIHRoZSBjb2x1bW5zLg0KDQpgYGB7ciBldmVyeXRoaW5nfQ0Kc2VsZWN0KGZsaWdodHMsIGFycl9kZWxheSwgZXZlcnl0aGluZygpKQ0KYGBgDQoNCiMjIyAzLiBXaGF0IGRvZXMgdGhlIGBvbmVfb2YoKWAgZnVuY3Rpb24gZG8/IFdoeSBtaWdodCBpdCBiZSBoZWxwZnVsIGluIGNvbmp1bmN0aW9uIHdpdGggdGhpcyB2ZWN0b3I/DQoNClRoZSBgb25lX29mKClgIGZ1bmN0aW9uIHNlbGVjdHMgdmFyaWFibGVzIHdpdGggYSBjaGFyYWN0ZXIgdmVjdG9yIHJhdGhlciB0aGFuIHVucXVvdGVkIHZhcmlhYmxlIG5hbWUgYXJndW1lbnRzLiBUaGlzIGZ1bmN0aW9uIGlzIHVzZWZ1bCBiZWNhdXNlIGl0IGlzIGVhc2llciB0byBwcm9ncmFtbWF0aWNhbGx5IGdlbmVyYXRlIGNoYXJhY3RlciB2ZWN0b3JzIHdpdGggdmFyaWFibGUgbmFtZXMgdGhhbiB0byBnZW5lcmF0ZSB1bnF1b3RlZCB2YXJpYWJsZSBuYW1lcywgd2hpY2ggYXJlIGVhc2llciB0byB0eXBlLg0KDQpgYGB7cn0NCnZhcnMgPC0gYygieWVhciIsICJtb250aCIsICJkYXkiLCAiZGVwX2RlbGF5IiwgImFycl9kZWxheSIpDQpzZWxlY3QoZmxpZ2h0cywgb25lX29mKHZhcnMpKQ0KYGBgDQoNCkluIHRoZSBtb3N0IHJlY2VudCB2ZXJzaW9ucyBvZiAqKmRwbHlyKiogeW91IGNhbiBwcm92aWRlIHRoZSBuYW1lIG9mIGEgdmVjdG9yIGNvbnRhaW5pbmcgdGhlIHZhcmlhYmxlIG5hbWVzIHlvdSB3aXNoIHRvIHNlbGVjdC4NCg0KYGBge3J9DQpzZWxlY3QoZmxpZ2h0cywgdmFycykNCmBgYA0KDQpIb3dldmVyIHRoZXJlIGlzIGEgcHJvYmxlbSB3aXRoIHRoaXMuIEl0IGlzIG5vdCBjbGVhciB3aGV0aGVyIGB2YXJzYCByZWZlcnMgdG8gYSBjb2x1bW4gbmFtZSBvciBhIHZhcmlhYmxlLiBUaGUgd2F5IHRoZSBjb2RlIGlzIGV2YWx1YXRlZCB3aWxsIGRlcGVuZCBvbiB3aGV0aGVyIGB2YXJzYCBpcyBjb2x1bW4gaW4gYGZsaWdodHNgLiBJZiBgdmFyc2Agd2FzIGEgY29sdW1uIGluIGBmbGlnaHRzYCwgdGhlbiB0aGF0IGNvZGUgd291bGQgc2VsZWN0IG9ubHkgdGhlIGB2YXJzYCBjb2x1bW4uIElmIGB2YXJzYCBpcyBub3QgYSBjb2x1bW4gaW4gYGZsaWdodHNgLCBhcyBpcyB0aGUgY2FzZSwgdGhlbiBgc2VsZWN0YCB3aWxsIHJlcGxhY2UgaXQgd2l0aCBpdHMgdmFsdWUsIGFuZCBzZWxlY3QgdGhvc2UgY29sdW1ucy4gSWYgaXQgaGFzIHRoZSBzYW1lIG5hbWUgb3IgdG8gZW5zdXJlIHRoYXQgaXQgd2lsbCBub3QgY29uZmxpY3Qgd2l0aCB0aGUgbmFtZXMgb2YgdGhlIGNvbHVtbnMgaW4gdGhlIGRhdGEgZnJhbWUsIHVzZSB0aGUgYCEhIWAgKGJhbmctYmFuZy1iYW5nKSBvcGVyYXRvci4NCg0KYGBge3IgYmFuZ2JhbmdiYW5nfQ0Kc2VsZWN0KGZsaWdodHMsICEhIXZhcnMpDQpgYGANCg0KVGhpcyBiZWhhdmlvciwgd2hpY2ggaXMgdXNlZCBieSBtYW55ICoqdGlkeXZlcnNlKiogZnVuY3Rpb25zLCBpcyBhbiBleGFtcGxlIG9mIHdoYXQgaXMgY2FsbGVkIG5vbi1zdGFuZGFyZCBldmFsdWF0aW9uIChOU0UpIGluIFIuIFNlZSB0aGUgKipkcGx5cioqIHZpZ25ldHRlLCBbUHJvZ3JhbW1pbmcgd2l0aCBkcGx5cl0oaHR0cHM6Ly9kcGx5ci50aWR5dmVyc2Uub3JnL2FydGljbGVzL3Byb2dyYW1taW5nLmh0bWwpLCBmb3IgbW9yZSBpbmZvcm1hdGlvbiBvbiB0aGlzIHRvcGljLg0KDQojIyMgNC5Eb2VzIHRoZSByZXN1bHQgb2YgcnVubmluZyB0aGUgZm9sbG93aW5nIGNvZGUgc3VycHJpc2UgeW91PyBIb3cgZG8gdGhlIHNlbGVjdCBoZWxwZXJzIGRlYWwgd2l0aCBjYXNlIGJ5IGRlZmF1bHQ/IEhvdyBjYW4geW91IGNoYW5nZSB0aGF0IGRlZmF1bHQ/DQoNCmBgYHtyfQ0Kc2VsZWN0KGZsaWdodHMsIGNvbnRhaW5zKCJUSU1FIikpDQpgYGANCg0KVGhlIGRlZmF1bHQgYmVoYXZpb3IgZm9yIGBjb250YWlucygpYCBpcyB0byBpZ25vcmUgY2FzZS4gVGhpcyBtYXkgb3IgbWF5IG5vdCBzdXJwcmlzZSB5b3UuIElmIHRoaXMgYmVoYXZpb3IgZG9lcyBub3Qgc3VycHJpc2UgeW91LCB0aGF0IGNvdWxkIGJlIHdoeSBpdCBpcyB0aGUgZGVmYXVsdC4gVXNlcnMgc2VhcmNoaW5nIGZvciB2YXJpYWJsZSBuYW1lcyBwcm9iYWJseSBoYXZlIGEgYmV0dGVyIHNlbnNlIG9mIHRoZSBsZXR0ZXJzIGluIHRoZSB2YXJpYWJsZSB0aGFuIHRoZWlyIGNhcGl0YWxpemF0aW9uLiBBIHNlY29uZCwgdGVjaG5pY2FsLCByZWFzb24gaXMgdGhhdCBkcGx5ciB3b3JrcyB3aXRoIG1vcmUgdGhhbiBSIGRhdGEgZnJhbWVzLiBJdCBjYW4gYWxzbyB3b3JrIHdpdGggYSB2YXJpZXR5IG9mIFtkYXRhYmFzZXNdKGh0dHBzOi8vZGIucnN0dWRpby5jb20vZHBseXIvKS4gU29tZSBvZiB0aGVzZSBkYXRhYmFzZSBlbmdpbmVzIGhhdmUgY2FzZSBpbnNlbnNpdGl2ZSBjb2x1bW4gbmFtZXMsIHNvIG1ha2luZyBmdW5jdGlvbnMgdGhhdCBtYXRjaCB2YXJpYWJsZSBuYW1lcyBjYXNlIGluc2Vuc2l0aXZlIGJ5IGRlZmF1bHQgd2lsbCBtYWtlIHRoZSBiZWhhdmlvciBvZiBgc2VsZWN0KClgIGNvbnNpc3RlbnQgcmVnYXJkbGVzcyBvZiB3aGV0aGVyIHRoZSB0YWJsZSBpcyBzdG9yZWQgYXMgYW4gUiBkYXRhIGZyYW1lIG9yIGluIGEgZGF0YWJhc2UuDQoNClRvIGNoYW5nZSB0aGUgYmVoYXZpb3IgYWRkIHRoZSBhcmd1bWVudCBgaWdub3JlLmNhc2UgPSBGQUxTRWAuDQoNCmBgYHtyIGlnbm9yZWNhc2V9DQpzZWxlY3QoZmxpZ2h0cywgY29udGFpbnMoIlRJTUUiLCBpZ25vcmUuY2FzZSA9IEZBTFNFKSkNCmBgYA==