complete(): Complete a data frame with missing combinations of data.
Description:
Turns implicit missing values into explicit missing values. This is a wrapper around expand(), dplyr::left_join() and replace_na() that’s useful for completing missing combinations of data.
Usage:
complete(data, ..., fill = list())
Example:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df %>% complete(group, nesting(item_id, item_name))
# You can also choose to fill in missing values
df %>% complete(group, nesting(item_id, item_name), fill = list(value1 = 0))
drop_na(): Drop rows containing missing values
Description:
Drop rows containing missing values
Usage:
drop_na(data, ...)
Example: library(dplyr) df <- tibble(x = c(1, 2, NA), y = c(“a”, NA, “b”)) df %>% drop_na() df %>% drop_na(x)
expand(): Expand data frame to include all combinations of values
Description:
expand() is often useful in conjunction with left_join if you want to convert implicit missing values to explicit missing values. Or you can use it in conjunction with anti_join() to figure out which combinations are missing.
Usage:
expand(data, ...)
Example:
library(dplyr)
# All possible combinations of vs & cyl, even those that aren't
# present in the data
expand(mtcars, vs, cyl)
extract(): Extract one column into multiple columns.
Description:
Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don’t match, or the input is NA, the output will be NA.
Usage:
extract(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE,
convert = FALSE, ...)
Example:
library(dplyr)
df <- data.frame(x = c(NA, "a-b", "a-d", "b-c", "d-e"))
df %>% extract(x, "A")
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
fill(): Fill in missing values.
Description:
Fills missing values in using the previous entry. This is useful in the common output format where values are not repeated, they’re recorded each time they change.
Usage:
fill(data, ..., .direction = c("down", "up"))
Example:
df <- data.frame(Month = 1:12, Year = c(2000, rep(NA, 11)))
df %>% fill(Year)
gather(): Gather columns into key-value pairs.
Description:
Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.
Usage:
gather(data, key = "key", value = "value", ..., na.rm = FALSE,
convert = FALSE, factor_key = FALSE)
Example:
library(dplyr)
# From http://stackoverflow.com/questions/1181060
stocks <- tibble(
time = as.Date('2009-01-01') + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
) gather(stocks, stock, price, -time) stocks %>% gather(stock, price, -time)
nest() Nest repeated values in a list-variable.
Description:
There are many possible ways one could choose to nest columns inside a data frame. nest() creates a list of data frames containing all the nested variables: this seems to be the most useful form in practice.
Usage:
nest(data, ..., .key = "data")
Example:
library(dplyr) as_tibble(iris) %>% nest(-Species) as_tibble(chickwts) %>% nest(weight) if (require(“gapminder”)) { gapminder %>% group_by(country, continent) %>% nest() gapminder %>% nest(-country, -continent) }
replace_na() Replace missing values
Description:
Replace missing values
Usage:
replace_na(data, replace, ...)
Example:
library(dplyr)
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"), z = list(1:5, NULL, 10:20))
df %>% replace_na(list(x = 0, y = "unknown"))
# NULL are the list-col equivalent of NAs
df %>% replace_na(list(z = list(5)))
df$x %>% replace_na(0)
df$y %>% replace_na("unknown")
separate(): Separate one column into multiple columns.
Description:
Given either regular expression or a vector of character positions, separate() turns a single char- acter column into multiple columns.
Usage:
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
convert = FALSE, extra = "warn", fill = "warn", ...)
Example:
library(dplyr)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate(x, c("A", "B"))
# If every row doesn't split into the same number of pieces, use
# the extra and file arguments to control what happens
df <- data.frame(x = c("a", "a b", "a b c", NA))
df %>% separate(x, c("a", "b"))
# The same behaviour but no warnings
df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")
# Another option:
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
# If only want to split specified number of times use extra = "merge"
df <- data.frame(x = c("x: 123", "y: error: 7"))
df %>% separate(x, c("key", "value"), ": ", extra = "merge")