Creating list-columns

suppressPackageStartupMessages(library("modelr"))

package 㤼㸱modelr㤼㸲 was built under R version 3.6.3

suppressPackageStartupMessages(library("tidyverse"))

package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3

suppressPackageStartupMessages(library("gapminder"))

package 㤼㸱gapminder㤼㸲 was built under R version 3.6.3

1. List all the functions that you can think of that take a atomic vector and return a list.

Many functions in the stringr package take a character vector as input and return a list.

str_split(sentences[1:3], " ")

[[1]]
[1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth"  "planks."

[[2]]
[1] "Glue"        "the"         "sheet"       "to"          "the"         "dark"        "blue"       
[8] "background."

[[3]]
[1] "It's"  "easy"  "to"    "tell"  "the"   "depth" "of"    "a"     "well."

str_match_all(c("abc", "aa", "aabaa", "abbbc"), "a+")

[[1]]
     [,1]
[1,] "a" 

[[2]]
     [,1]
[1,] "aa"

[[3]]
     [,1]
[1,] "aa"
[2,] "aa"

[[4]]
     [,1]
[1,] "a"

The map() function takes a vector and always returns a list.

map(1:3, runif)

[[1]]
[1] 0.4197749

[[2]]
[1] 0.6350175 0.5589084

[[3]]
[1] 0.03797927 0.41333901 0.85545111

2. Brainstorm useful summary functions that, like `quantile()`, return multiple values.

Some examples of summary functions that return multiple values are the following.

range(mtcars$mpg)

[1] 10.4 33.9

fivenum(mtcars$mpg)

[1] 10.40 15.35 19.20 22.80 33.90

boxplot.stats(mtcars$mpg)

$stats
[1] 10.40 15.35 19.20 22.80 33.90

$n
[1] 32

$conf
[1] 17.11916 21.28084

$out
numeric(0)

3. What’s missing in the following data frame? How does `quantile()` return that missing piece? Why isn’t that helpful here?

mtcars %>%
  group_by(cyl) %>%
  summarise(q = list(quantile(mpg))) %>%
  unnest()

`cols` is now required.
Please use `cols = c(q)`

The particular quantiles of the values are missing, e.g. 0%, 25%, 50%, 75%, 100%. quantile() returns these in the names of the vector.

quantile(mtcars$mpg)

    0%    25%    50%    75%   100% 
10.400 15.425 19.200 22.800 33.900

Since the unnest function drops the names of the vector, they aren’t useful here.

4. What does this code do? Why might might it be useful?

mtcars %>%
  group_by(cyl) %>%
  summarise_each(funs(list))

funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
[90mThis warning is displayed once per session.[39m

mtcars %>%
  group_by(cyl) %>%
  summarise_each(funs(list))

It creates a data frame in which each row corresponds to a value of cyl, and each observation for each column (other than cyl) is a vector of all the values of that column for that value of cyl. It seems like it should be useful to have all the observations of each variable for each group, but off the top of my head, I can’t think of a specific use for this. But, it seems that it may do many things that dplyr::do does.

LS0tDQp0aXRsZTogIkNyZWF0aW5nIGxpc3QtY29sdW1ucyINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3J9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgibW9kZWxyIikpDQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgiZ2FwbWluZGVyIikpDQpgYGANCg0KIyMjIDEuIExpc3QgYWxsIHRoZSBmdW5jdGlvbnMgdGhhdCB5b3UgY2FuIHRoaW5rIG9mIHRoYXQgdGFrZSBhIGF0b21pYyB2ZWN0b3IgYW5kIHJldHVybiBhIGxpc3QuDQoNCk1hbnkgZnVuY3Rpb25zIGluIHRoZSBgc3RyaW5ncmAgcGFja2FnZSB0YWtlIGEgY2hhcmFjdGVyIHZlY3RvciBhcyBpbnB1dCBhbmQgcmV0dXJuIGEgbGlzdC4NCg0KYGBge3J9DQpzdHJfc3BsaXQoc2VudGVuY2VzWzE6M10sICIgIikNCnN0cl9tYXRjaF9hbGwoYygiYWJjIiwgImFhIiwgImFhYmFhIiwgImFiYmJjIiksICJhKyIpDQpgYGANCg0KVGhlIGBtYXAoKWAgZnVuY3Rpb24gdGFrZXMgYSB2ZWN0b3IgYW5kIGFsd2F5cyByZXR1cm5zIGEgbGlzdC4NCg0KYGBge3J9DQptYXAoMTozLCBydW5pZikNCmBgYA0KDQojIyMgMi4gQnJhaW5zdG9ybSB1c2VmdWwgc3VtbWFyeSBmdW5jdGlvbnMgdGhhdCwgbGlrZSBgcXVhbnRpbGUoKWAsIHJldHVybiBtdWx0aXBsZSB2YWx1ZXMuDQoNClNvbWUgZXhhbXBsZXMgb2Ygc3VtbWFyeSBmdW5jdGlvbnMgdGhhdCByZXR1cm4gbXVsdGlwbGUgdmFsdWVzIGFyZSB0aGUgZm9sbG93aW5nLg0KDQpgYGB7cn0NCnJhbmdlKG10Y2FycyRtcGcpDQpmaXZlbnVtKG10Y2FycyRtcGcpDQpib3hwbG90LnN0YXRzKG10Y2FycyRtcGcpDQpgYGANCg0KIyMjIDMuIFdoYXTigJlzIG1pc3NpbmcgaW4gdGhlIGZvbGxvd2luZyBkYXRhIGZyYW1lPyBIb3cgZG9lcyBgcXVhbnRpbGUoKWAgcmV0dXJuIHRoYXQgbWlzc2luZyBwaWVjZT8gV2h5IGlzbuKAmXQgdGhhdCBoZWxwZnVsIGhlcmU/DQoNCmBgYHtyfQ0KbXRjYXJzICU+JQ0KICBncm91cF9ieShjeWwpICU+JQ0KICBzdW1tYXJpc2UocSA9IGxpc3QocXVhbnRpbGUobXBnKSkpICU+JQ0KICB1bm5lc3QoKQ0KYGBgDQoNClRoZSBwYXJ0aWN1bGFyIHF1YW50aWxlcyBvZiB0aGUgdmFsdWVzIGFyZSBtaXNzaW5nLCBlLmcuIDAlLCAyNSUsIDUwJSwgNzUlLCAxMDAlLiBgcXVhbnRpbGUoKWAgcmV0dXJucyB0aGVzZSBpbiB0aGUgbmFtZXMgb2YgdGhlIHZlY3Rvci4NCg0KYGBge3J9DQpxdWFudGlsZShtdGNhcnMkbXBnKQ0KYGBgDQoNClNpbmNlIHRoZSB1bm5lc3QgZnVuY3Rpb24gZHJvcHMgdGhlIG5hbWVzIG9mIHRoZSB2ZWN0b3IsIHRoZXkgYXJlbuKAmXQgdXNlZnVsIGhlcmUuDQoNCiMjIyA0LiBXaGF0IGRvZXMgdGhpcyBjb2RlIGRvPyBXaHkgbWlnaHQgbWlnaHQgaXQgYmUgdXNlZnVsPw0KDQpgYGB7cn0NCm10Y2FycyAlPiUNCiAgZ3JvdXBfYnkoY3lsKSAlPiUNCiAgc3VtbWFyaXNlX2VhY2goZnVucyhsaXN0KSkNCm10Y2FycyAlPiUNCiAgZ3JvdXBfYnkoY3lsKSAlPiUNCiAgc3VtbWFyaXNlX2VhY2goZnVucyhsaXN0KSkNCmBgYA0KDQpJdCBjcmVhdGVzIGEgZGF0YSBmcmFtZSBpbiB3aGljaCBlYWNoIHJvdyBjb3JyZXNwb25kcyB0byBhIHZhbHVlIG9mIGBjeWxgLCBhbmQgZWFjaCBvYnNlcnZhdGlvbiBmb3IgZWFjaCBjb2x1bW4gKG90aGVyIHRoYW4gYGN5bGApIGlzIGEgdmVjdG9yIG9mIGFsbCB0aGUgdmFsdWVzIG9mIHRoYXQgY29sdW1uIGZvciB0aGF0IHZhbHVlIG9mIGBjeWxgLiBJdCBzZWVtcyBsaWtlIGl0IHNob3VsZCBiZSB1c2VmdWwgdG8gaGF2ZSBhbGwgdGhlIG9ic2VydmF0aW9ucyBvZiBlYWNoIHZhcmlhYmxlIGZvciBlYWNoIGdyb3VwLCBidXQgb2ZmIHRoZSB0b3Agb2YgbXkgaGVhZCwgSSBjYW7igJl0IHRoaW5rIG9mIGEgc3BlY2lmaWMgdXNlIGZvciB0aGlzLiBCdXQsIGl0IHNlZW1zIHRoYXQgaXQgbWF5IGRvIG1hbnkgdGhpbmdzIHRoYXQgYGRwbHlyOjpkb2AgZG9lcy4=

Creating list-columns

1. List all the functions that you can think of that take a atomic vector and return a list.

2. Brainstorm useful summary functions that, like quantile(), return multiple values.

3. What’s missing in the following data frame? How does quantile() return that missing piece? Why isn’t that helpful here?

4. What does this code do? Why might might it be useful?

2. Brainstorm useful summary functions that, like `quantile()`, return multiple values.

3. What’s missing in the following data frame? How does `quantile()` return that missing piece? Why isn’t that helpful here?