suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3

1. What do the extra and fill arguments do in separate()? Experiment with the various options for the following two toy datasets.

tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"))
Expected 3 pieces. Additional pieces discarded in 1 rows [2].
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"))
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].

The extra argument tells separate() what to do if there are too many pieces, and the fill argument tells it what to do if there aren’t enough. By default, separate() drops extra values with a warning.

tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"))
Expected 3 pieces. Additional pieces discarded in 1 rows [2].

Adding the argument, extra = "drop", produces the same result as above but without the warning.

tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"), extra = "drop")

Setting extra = "merge", then the extra values are not split, so "f,g" appears in column three.

tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"), extra = "merge")

In this example, one of the values, "d,e", has too few elements. The default for fill is similar to those in separate(); it fills columns with missing values but emits a warning. In this example, the 2nd row of column three is NA.

tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"))
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].

Alternative options for the fill are "right", to fill with missing values from the right, but without a warning

tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"), fill = "right")

The option fill = "left" also fills with missing values without emitting a warning, but this time from the left side. Now, the 2nd row of column one will be missing, and the other values in that row are shifted right.

tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"), fill = "left")

2 Both unite() and separate() have a remove argument. What does it do? Why would you set it to FALSE?

The remove argument discards input columns in the result data frame. You would set it to FALSE if you want to create a new variable, but keep the old one.

3. Compare and contrast separate() and extract(), Why are there three variations of separation (by position, by separator, and with groups), but only one unite?

The function separate(), splits a column into multiple columns by separator, if the sep argument is a character vector, or by character positions, if sep is numeric.

# example with separators
tibble(x = c("X_1", "X_2", "AA_1", "AA_2")) %>%
  separate(x, c("variable", "into"), sep = "_")

# example with position
tibble(x = c("X1", "X2", "Y1", "Y2")) %>%
  separate(x, c("variable", "into"), sep = c(1))

The function extract() uses a regular expression to specify groups in character vector and split that single character vector into multiple columns. This is more flexible than separate() because it does not require a common separator or specific column positions.

# example with separators
tibble(x = c("X_1", "X_2", "AA_1", "AA_2")) %>%
  extract(x, c("variable", "id"), regex = "([A-Z])_([0-9])")

# example with position
tibble(x = c("X1", "X2", "Y1", "Y2")) %>%
  extract(x, c("variable", "id"), regex = "([A-Z])([0-9])")

# example that separate could not parse
tibble(x = c("X1", "X20", "AA11", "AA2")) %>%
  extract(x, c("variable", "id"), regex = "([A-Z]+)([0-9]+)")

Both separate() and extract() convert a single column to many columns. However, unite() converts many columns to one, with a choice of a separator to include between column values.

tibble(variable = c("X", "X", "Y", "Y"), id = c(1, 2, 1, 2)) %>%
  unite(x, variable, id, sep = "_")

In other words, with extract() and separate() only one column can be chosen, but there are many choices how to split that single column into different columns. With unite(), there are many choices as to which columns to include, but only one choice as to how to combine their contents into a single vector.

LS0tDQp0aXRsZTogIlNlcGFyYXRpbmcgYW5kIHVuaXRpbmciDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCmBgYHtyfQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInRpZHl2ZXJzZSIpKQ0KYGBgDQoNCiMjIyAxLiBXaGF0IGRvIHRoZSBleHRyYSBhbmQgZmlsbCBhcmd1bWVudHMgZG8gaW4gYHNlcGFyYXRlKClgPyBFeHBlcmltZW50IHdpdGggdGhlIHZhcmlvdXMgb3B0aW9ucyBmb3IgdGhlIGZvbGxvd2luZyB0d28gdG95IGRhdGFzZXRzLg0KDQpgYGB7cn0NCnRpYmJsZSh4ID0gYygiYSxiLGMiLCAiZCxlLGYsZyIsICJoLGksaiIpKSAlPiUNCiAgc2VwYXJhdGUoeCwgYygib25lIiwgInR3byIsICJ0aHJlZSIpKQ0KDQp0aWJibGUoeCA9IGMoImEsYixjIiwgImQsZSIsICJmLGcsaSIpKSAlPiUNCiAgc2VwYXJhdGUoeCwgYygib25lIiwgInR3byIsICJ0aHJlZSIpKQ0KYGBgDQoNClRoZSBgZXh0cmFgIGFyZ3VtZW50IHRlbGxzIGBzZXBhcmF0ZSgpYCB3aGF0IHRvIGRvIGlmIHRoZXJlIGFyZSB0b28gbWFueSBwaWVjZXMsIGFuZCB0aGUgZmlsbCBhcmd1bWVudCB0ZWxscyBpdCB3aGF0IHRvIGRvIGlmIHRoZXJlIGFyZW7igJl0IGVub3VnaC4gQnkgZGVmYXVsdCwgYHNlcGFyYXRlKClgIGRyb3BzIGV4dHJhIHZhbHVlcyB3aXRoIGEgd2FybmluZy4NCg0KYGBge3J9DQp0aWJibGUoeCA9IGMoImEsYixjIiwgImQsZSxmLGciLCAiaCxpLGoiKSkgJT4lDQogIHNlcGFyYXRlKHgsIGMoIm9uZSIsICJ0d28iLCAidGhyZWUiKSkNCmBgYA0KDQpBZGRpbmcgdGhlIGFyZ3VtZW50LCBgZXh0cmEgPSAiZHJvcCJgLCBwcm9kdWNlcyB0aGUgc2FtZSByZXN1bHQgYXMgYWJvdmUgYnV0IHdpdGhvdXQgdGhlIHdhcm5pbmcuDQoNCmBgYHtyfQ0KdGliYmxlKHggPSBjKCJhLGIsYyIsICJkLGUsZixnIiwgImgsaSxqIikpICU+JQ0KICBzZXBhcmF0ZSh4LCBjKCJvbmUiLCAidHdvIiwgInRocmVlIiksIGV4dHJhID0gImRyb3AiKQ0KYGBgDQoNClNldHRpbmcgYGV4dHJhID0gIm1lcmdlImAsIHRoZW4gdGhlIGV4dHJhIHZhbHVlcyBhcmUgbm90IHNwbGl0LCBzbyBgImYsZyJgIGFwcGVhcnMgaW4gY29sdW1uIHRocmVlLg0KDQpgYGB7cn0NCnRpYmJsZSh4ID0gYygiYSxiLGMiLCAiZCxlLGYsZyIsICJoLGksaiIpKSAlPiUNCiAgc2VwYXJhdGUoeCwgYygib25lIiwgInR3byIsICJ0aHJlZSIpLCBleHRyYSA9ICJtZXJnZSIpDQpgYGANCg0KSW4gdGhpcyBleGFtcGxlLCBvbmUgb2YgdGhlIHZhbHVlcywgYCJkLGUiYCwgaGFzIHRvbyBmZXcgZWxlbWVudHMuIFRoZSBkZWZhdWx0IGZvciBgZmlsbGAgaXMgc2ltaWxhciB0byB0aG9zZSBpbiBgc2VwYXJhdGUoKWA7IGl0IGZpbGxzIGNvbHVtbnMgd2l0aCBtaXNzaW5nIHZhbHVlcyBidXQgZW1pdHMgYSB3YXJuaW5nLiBJbiB0aGlzIGV4YW1wbGUsIHRoZSAybmQgcm93IG9mIGNvbHVtbiBgdGhyZWVgIGlzIGBOQWAuDQoNCmBgYHtyfQ0KdGliYmxlKHggPSBjKCJhLGIsYyIsICJkLGUiLCAiZixnLGkiKSkgJT4lDQogIHNlcGFyYXRlKHgsIGMoIm9uZSIsICJ0d28iLCAidGhyZWUiKSkNCmBgYA0KDQpBbHRlcm5hdGl2ZSBvcHRpb25zIGZvciB0aGUgYGZpbGxgIGFyZSBgInJpZ2h0ImAsIHRvIGZpbGwgd2l0aCBtaXNzaW5nIHZhbHVlcyBmcm9tIHRoZSByaWdodCwgYnV0IHdpdGhvdXQgYSB3YXJuaW5nDQoNCmBgYHtyfQ0KdGliYmxlKHggPSBjKCJhLGIsYyIsICJkLGUiLCAiZixnLGkiKSkgJT4lDQogIHNlcGFyYXRlKHgsIGMoIm9uZSIsICJ0d28iLCAidGhyZWUiKSwgZmlsbCA9ICJyaWdodCIpDQpgYGANCg0KVGhlIG9wdGlvbiBgZmlsbCA9ICJsZWZ0ImAgYWxzbyBmaWxscyB3aXRoIG1pc3NpbmcgdmFsdWVzIHdpdGhvdXQgZW1pdHRpbmcgYSB3YXJuaW5nLCBidXQgdGhpcyB0aW1lIGZyb20gdGhlIGxlZnQgc2lkZS4gTm93LCB0aGUgMm5kIHJvdyBvZiBjb2x1bW4gYG9uZWAgd2lsbCBiZSBtaXNzaW5nLCBhbmQgdGhlIG90aGVyIHZhbHVlcyBpbiB0aGF0IHJvdyBhcmUgc2hpZnRlZCByaWdodC4NCg0KYGBge3J9DQp0aWJibGUoeCA9IGMoImEsYixjIiwgImQsZSIsICJmLGcsaSIpKSAlPiUNCiAgc2VwYXJhdGUoeCwgYygib25lIiwgInR3byIsICJ0aHJlZSIpLCBmaWxsID0gImxlZnQiKQ0KYGBgDQoNCiMjIyAyIEJvdGggYHVuaXRlKClgIGFuZCBgc2VwYXJhdGUoKWAgaGF2ZSBhIHJlbW92ZSBhcmd1bWVudC4gV2hhdCBkb2VzIGl0IGRvPyBXaHkgd291bGQgeW91IHNldCBpdCB0byBGQUxTRT8NCg0KVGhlIGByZW1vdmVgIGFyZ3VtZW50IGRpc2NhcmRzIGlucHV0IGNvbHVtbnMgaW4gdGhlIHJlc3VsdCBkYXRhIGZyYW1lLiBZb3Ugd291bGQgc2V0IGl0IHRvIGBGQUxTRWAgaWYgeW91IHdhbnQgdG8gY3JlYXRlIGEgbmV3IHZhcmlhYmxlLCBidXQga2VlcCB0aGUgb2xkIG9uZS4NCg0KIyMjIDMuIENvbXBhcmUgYW5kIGNvbnRyYXN0IGBzZXBhcmF0ZSgpYCBhbmQgYGV4dHJhY3QoKWAsIFdoeSBhcmUgdGhlcmUgdGhyZWUgdmFyaWF0aW9ucyBvZiBzZXBhcmF0aW9uIChieSBwb3NpdGlvbiwgYnkgc2VwYXJhdG9yLCBhbmQgd2l0aCBncm91cHMpLCBidXQgb25seSBvbmUgdW5pdGU/DQoNClRoZSBmdW5jdGlvbiBgc2VwYXJhdGUoKWAsIHNwbGl0cyBhIGNvbHVtbiBpbnRvIG11bHRpcGxlIGNvbHVtbnMgYnkgc2VwYXJhdG9yLCBpZiB0aGUgc2VwIGFyZ3VtZW50IGlzIGEgY2hhcmFjdGVyIHZlY3Rvciwgb3IgYnkgY2hhcmFjdGVyIHBvc2l0aW9ucywgaWYgc2VwIGlzIG51bWVyaWMuDQoNCmBgYHtyfQ0KIyBleGFtcGxlIHdpdGggc2VwYXJhdG9ycw0KdGliYmxlKHggPSBjKCJYXzEiLCAiWF8yIiwgIkFBXzEiLCAiQUFfMiIpKSAlPiUNCiAgc2VwYXJhdGUoeCwgYygidmFyaWFibGUiLCAiaW50byIpLCBzZXAgPSAiXyIpDQoNCiMgZXhhbXBsZSB3aXRoIHBvc2l0aW9uDQp0aWJibGUoeCA9IGMoIlgxIiwgIlgyIiwgIlkxIiwgIlkyIikpICU+JQ0KICBzZXBhcmF0ZSh4LCBjKCJ2YXJpYWJsZSIsICJpbnRvIiksIHNlcCA9IGMoMSkpDQpgYGANCg0KVGhlIGZ1bmN0aW9uIGBleHRyYWN0KClgIHVzZXMgYSByZWd1bGFyIGV4cHJlc3Npb24gdG8gc3BlY2lmeSBncm91cHMgaW4gY2hhcmFjdGVyIHZlY3RvciBhbmQgc3BsaXQgdGhhdCBzaW5nbGUgY2hhcmFjdGVyIHZlY3RvciBpbnRvIG11bHRpcGxlIGNvbHVtbnMuIFRoaXMgaXMgbW9yZSBmbGV4aWJsZSB0aGFuIGBzZXBhcmF0ZSgpYCBiZWNhdXNlIGl0IGRvZXMgbm90IHJlcXVpcmUgYSBjb21tb24gc2VwYXJhdG9yIG9yIHNwZWNpZmljIGNvbHVtbiBwb3NpdGlvbnMuDQoNCmBgYHtyfQ0KIyBleGFtcGxlIHdpdGggc2VwYXJhdG9ycw0KdGliYmxlKHggPSBjKCJYXzEiLCAiWF8yIiwgIkFBXzEiLCAiQUFfMiIpKSAlPiUNCiAgZXh0cmFjdCh4LCBjKCJ2YXJpYWJsZSIsICJpZCIpLCByZWdleCA9ICIoW0EtWl0pXyhbMC05XSkiKQ0KDQojIGV4YW1wbGUgd2l0aCBwb3NpdGlvbg0KdGliYmxlKHggPSBjKCJYMSIsICJYMiIsICJZMSIsICJZMiIpKSAlPiUNCiAgZXh0cmFjdCh4LCBjKCJ2YXJpYWJsZSIsICJpZCIpLCByZWdleCA9ICIoW0EtWl0pKFswLTldKSIpDQoNCiMgZXhhbXBsZSB0aGF0IHNlcGFyYXRlIGNvdWxkIG5vdCBwYXJzZQ0KdGliYmxlKHggPSBjKCJYMSIsICJYMjAiLCAiQUExMSIsICJBQTIiKSkgJT4lDQogIGV4dHJhY3QoeCwgYygidmFyaWFibGUiLCAiaWQiKSwgcmVnZXggPSAiKFtBLVpdKykoWzAtOV0rKSIpDQpgYGANCg0KQm90aCBgc2VwYXJhdGUoKWAgYW5kIGBleHRyYWN0KClgIGNvbnZlcnQgYSBzaW5nbGUgY29sdW1uIHRvIG1hbnkgY29sdW1ucy4gSG93ZXZlciwgYHVuaXRlKClgIGNvbnZlcnRzIG1hbnkgY29sdW1ucyB0byBvbmUsIHdpdGggYSBjaG9pY2Ugb2YgYSBzZXBhcmF0b3IgdG8gaW5jbHVkZSBiZXR3ZWVuIGNvbHVtbiB2YWx1ZXMuDQoNCmBgYHtyfQ0KdGliYmxlKHZhcmlhYmxlID0gYygiWCIsICJYIiwgIlkiLCAiWSIpLCBpZCA9IGMoMSwgMiwgMSwgMikpICU+JQ0KICB1bml0ZSh4LCB2YXJpYWJsZSwgaWQsIHNlcCA9ICJfIikNCmBgYA0KDQpJbiBvdGhlciB3b3Jkcywgd2l0aCBgZXh0cmFjdCgpYCBhbmQgYHNlcGFyYXRlKClgIG9ubHkgb25lIGNvbHVtbiBjYW4gYmUgY2hvc2VuLCBidXQgdGhlcmUgYXJlIG1hbnkgY2hvaWNlcyBob3cgdG8gc3BsaXQgdGhhdCBzaW5nbGUgY29sdW1uIGludG8gZGlmZmVyZW50IGNvbHVtbnMuIFdpdGggYHVuaXRlKClgLCB0aGVyZSBhcmUgbWFueSBjaG9pY2VzIGFzIHRvIHdoaWNoIGNvbHVtbnMgdG8gaW5jbHVkZSwgYnV0IG9ubHkgb25lIGNob2ljZSBhcyB0byBob3cgdG8gY29tYmluZSB0aGVpciBjb250ZW50cyBpbnRvIGEgc2luZ2xlIHZlY3Rvci4=