Let’s look at some very useful but lesser known dplyr functions. unite function collapse two separate column into one column whereas separate breaks one column into two new ones. The function is by defined as unite(data, col, …, sep = “_“, remove = TRUE)
library(tidyverse)
mtcars%>%head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
unite_(mtcars, "vs_am", c("vs","am"))%>%head()
## mpg cyl disp hp drat wt qsec vs_am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0_1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0_1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1_1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1_0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0_0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1_0 3 1
unite_(mtcars, "vs:am", c("vs","am"),sep=":")%>%head()
## mpg cyl disp hp drat wt qsec vs:am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0:1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0:1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1:1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1:0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0:0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1:0 3 1
# Separate is the complement of unite
mtcars %>%
unite(vs_am, vs, am) %>%
separate(vs_am, c("vs", "am"))%>%head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Extract columns by name. Also select_if()
select_all(mtcars, funs(toupper(.)))%>%head()
## MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# collect names of all categorical variables
vars <-function(x) names(x)[grepl('Sepal', names(x))]
iris[,vars(iris)]%>%head()
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## 3 4.7 3.2
## 4 4.6 3.1
## 5 5.0 3.6
## 6 5.4 3.9
select_if(iris, is.factor, toupper)%>%head()
## SPECIES
## 1 setosa
## 2 setosa
## 3 setosa
## 4 setosa
## 5 setosa
## 6 setosa
#select_if(iris, vars, toupper)
# Selection drops unselected variables:
is_whole <- function(x) all(floor(x) == x)
select_if(mtcars, is_whole, toupper)%>%head()
## CYL HP VS AM GEAR CARB
## Mazda RX4 6 110 0 1 4 4
## Mazda RX4 Wag 6 110 0 1 4 4
## Datsun 710 4 93 1 1 4 1
## Hornet 4 Drive 6 110 1 0 3 1
## Hornet Sportabout 8 175 0 0 3 2
## Valiant 6 105 1 0 3 1
# But renaming retains them:
rename_if(mtcars, is_whole, toupper)%>%head()
## mpg CYL disp HP drat wt qsec VS AM GEAR CARB
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# The renaming function is optional for selection:
select_if(mtcars, is_whole)%>%head()
## cyl hp vs am gear carb
## Mazda RX4 6 110 0 1 4 4
## Mazda RX4 Wag 6 110 0 1 4 4
## Datsun 710 4 93 1 1 4 1
## Hornet 4 Drive 6 110 1 0 3 1
## Hornet Sportabout 8 175 0 0 3 2
## Valiant 6 105 1 0 3 1
#Use these helpers with select (),
select(iris, starts_with("Sepal"))%>%head()
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## 3 4.7 3.2
## 4 4.6 3.1
## 5 5.0 3.6
## 6 5.4 3.9
select(iris, ends_with("Width"))%>%head()
## Sepal.Width Petal.Width
## 1 3.5 0.2
## 2 3.0 0.2
## 3 3.2 0.2
## 4 3.1 0.2
## 5 3.6 0.2
## 6 3.9 0.4
nms <- iris%>%head(3)
select(nms, contains("Sepal"))
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## 3 4.7 3.2
select(iris, contains("Petal"))
## Petal.Length Petal.Width
## 1 1.4 0.2
## 2 1.4 0.2
## 3 1.3 0.2
## 4 1.5 0.2
## 5 1.4 0.2
## 6 1.7 0.4
## 7 1.4 0.3
## 8 1.5 0.2
## 9 1.4 0.2
## 10 1.5 0.1
## 11 1.5 0.2
## 12 1.6 0.2
## 13 1.4 0.1
## 14 1.1 0.1
## 15 1.2 0.2
## 16 1.5 0.4
## 17 1.3 0.4
## 18 1.4 0.3
## 19 1.7 0.3
## 20 1.5 0.3
## 21 1.7 0.2
## 22 1.5 0.4
## 23 1.0 0.2
## 24 1.7 0.5
## 25 1.9 0.2
## 26 1.6 0.2
## 27 1.6 0.4
## 28 1.5 0.2
## 29 1.4 0.2
## 30 1.6 0.2
## 31 1.6 0.2
## 32 1.5 0.4
## 33 1.5 0.1
## 34 1.4 0.2
## 35 1.5 0.2
## 36 1.2 0.2
## 37 1.3 0.2
## 38 1.4 0.1
## 39 1.3 0.2
## 40 1.5 0.2
## 41 1.3 0.3
## 42 1.3 0.3
## 43 1.3 0.2
## 44 1.6 0.6
## 45 1.9 0.4
## 46 1.4 0.3
## 47 1.6 0.2
## 48 1.4 0.2
## 49 1.5 0.2
## 50 1.4 0.2
## 51 4.7 1.4
## 52 4.5 1.5
## 53 4.9 1.5
## 54 4.0 1.3
## 55 4.6 1.5
## 56 4.5 1.3
## 57 4.7 1.6
## 58 3.3 1.0
## 59 4.6 1.3
## 60 3.9 1.4
## 61 3.5 1.0
## 62 4.2 1.5
## 63 4.0 1.0
## 64 4.7 1.4
## 65 3.6 1.3
## 66 4.4 1.4
## 67 4.5 1.5
## 68 4.1 1.0
## 69 4.5 1.5
## 70 3.9 1.1
## 71 4.8 1.8
## 72 4.0 1.3
## 73 4.9 1.5
## 74 4.7 1.2
## 75 4.3 1.3
## 76 4.4 1.4
## 77 4.8 1.4
## 78 5.0 1.7
## 79 4.5 1.5
## 80 3.5 1.0
## 81 3.8 1.1
## 82 3.7 1.0
## 83 3.9 1.2
## 84 5.1 1.6
## 85 4.5 1.5
## 86 4.5 1.6
## 87 4.7 1.5
## 88 4.4 1.3
## 89 4.1 1.3
## 90 4.0 1.3
## 91 4.4 1.2
## 92 4.6 1.4
## 93 4.0 1.2
## 94 3.3 1.0
## 95 4.2 1.3
## 96 4.2 1.2
## 97 4.2 1.3
## 98 4.3 1.3
## 99 3.0 1.1
## 100 4.1 1.3
## 101 6.0 2.5
## 102 5.1 1.9
## 103 5.9 2.1
## 104 5.6 1.8
## 105 5.8 2.2
## 106 6.6 2.1
## 107 4.5 1.7
## 108 6.3 1.8
## 109 5.8 1.8
## 110 6.1 2.5
## 111 5.1 2.0
## 112 5.3 1.9
## 113 5.5 2.1
## 114 5.0 2.0
## 115 5.1 2.4
## 116 5.3 2.3
## 117 5.5 1.8
## 118 6.7 2.2
## 119 6.9 2.3
## 120 5.0 1.5
## 121 5.7 2.3
## 122 4.9 2.0
## 123 6.7 2.0
## 124 4.9 1.8
## 125 5.7 2.1
## 126 6.0 1.8
## 127 4.8 1.8
## 128 4.9 1.8
## 129 5.6 2.1
## 130 5.8 1.6
## 131 6.1 1.9
## 132 6.4 2.0
## 133 5.6 2.2
## 134 5.1 1.5
## 135 5.6 1.4
## 136 6.1 2.3
## 137 5.6 2.4
## 138 5.5 1.8
## 139 4.8 1.8
## 140 5.4 2.1
## 141 5.6 2.4
## 142 5.1 2.3
## 143 5.1 1.9
## 144 5.9 2.3
## 145 5.7 2.5
## 146 5.2 2.3
## 147 5.0 1.9
## 148 5.2 2.0
## 149 5.4 2.3
## 150 5.1 1.8
select(nms, starts_with("Petal"))
## Petal.Length Petal.Width
## 1 1.4 0.2
## 2 1.4 0.2
## 3 1.3 0.2
select(nms, ends_with("Width"))
## Sepal.Width Petal.Width
## 1 3.5 0.2
## 2 3.0 0.2
## 3 3.2 0.2
select(nms, contains("etal"))
## Petal.Length Petal.Width
## 1 1.4 0.2
## 2 1.4 0.2
## 3 1.3 0.2
select(nms, matches(".t."))
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
select(nms, Petal.Length, Petal.Width)
## Petal.Length Petal.Width
## 1 1.4 0.2
## 2 1.4 0.2
## 3 1.3 0.2
select(nms, everything())
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
vars <- c("Petal.Length", "Petal.Width")
select(nms, one_of(vars))
## Petal.Length Petal.Width
## 1 1.4 0.2
## 2 1.4 0.2
## 3 1.3 0.2
library(tidyselect)
vars_select(names(iris), last_col())
## Species
## "Species"
vars_select(names(iris), last_col(offset = 2))
## Petal.Length
## "Petal.Length"
# Keep variables
vars_select(names(iris), everything())
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
vars_select(names(iris), starts_with("Petal"))
## Petal.Length Petal.Width
## "Petal.Length" "Petal.Width"
vars_select(names(iris), ends_with("Width"))
## Sepal.Width Petal.Width
## "Sepal.Width" "Petal.Width"
vars_select(names(iris), contains("etal"))
## Petal.Length Petal.Width
## "Petal.Length" "Petal.Width"
vars_select(names(iris), matches(".t."))
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
vars_select(names(iris), Petal.Length, Petal.Width)
## Petal.Length Petal.Width
## "Petal.Length" "Petal.Width"
vars_select(names(iris), one_of("Petal.Length", "Petal.Width"))
## Petal.Length Petal.Width
## "Petal.Length" "Petal.Width"
# Drop variables
vars_select(names(iris), -starts_with("Petal"))
## Sepal.Length Sepal.Width Species
## "Sepal.Length" "Sepal.Width" "Species"
vars_select(names(iris), -ends_with("Width"))
## Sepal.Length Petal.Length Species
## "Sepal.Length" "Petal.Length" "Species"
vars_select(names(iris), -contains("etal"))
## Sepal.Length Sepal.Width Species
## "Sepal.Length" "Sepal.Width" "Species"
vars_select(names(iris), -matches(".t."))
## Species
## "Species"
vars_select(names(iris), -Petal.Length, -Petal.Width)
## Sepal.Length Sepal.Width Species
## "Sepal.Length" "Sepal.Width" "Species"
# Rename variables
vars_select(names(iris), petal_length = Petal.Length)
## petal_length
## "Petal.Length"
vars_select(names(iris), petal = starts_with("Petal"))
## petal1 petal2
## "Petal.Length" "Petal.Width"
# Rename variables preserving all existing
vars_rename(names(iris), petal_length = Petal.Length)
## Sepal.Length Sepal.Width petal_length Petal.Width Species
## "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# You can unquote symbols or quosures
vars_select(names(iris), !! quote(Petal.Length))
## Petal.Length
## "Petal.Length"
# And unquote-splice lists of symbols or quosures
vars_select(names(iris), !!! list(quo(Petal.Length), quote(Petal.Width)))
## Petal.Length Petal.Width
## "Petal.Length" "Petal.Width"
Compute new column(s).
mutate(mtcars, gpm = 1/mpg)%>%head()
## mpg cyl disp hp drat wt qsec vs am gear carb gpm
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.04761905
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.04761905
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.04385965
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.04672897
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.05347594
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0.05524862
Compute new column(s), drop others.
transmute(mtcars, gpm = 1/mpg)%>%head()
## gpm
## 1 0.04761905
## 2 0.04761905
## 3 0.04385965
## 4 0.04672897
## 5 0.05347594
## 6 0.05524862
column. Use with funs().
f=faithful%>%head(4)
mutate_all(f, funs(log(.), log2(.)))
## eruptions waiting eruptions_log waiting_log eruptions_log2 waiting_log2
## 1 3.600 79 1.2809338 4.369448 1.8479969 6.303781
## 2 1.800 54 0.5877867 3.988984 0.8479969 5.754888
## 3 3.333 74 1.2038728 4.304065 1.7368213 6.209453
## 4 2.283 62 0.8254904 4.127134 1.1909309 5.954196
Apply funs to all columns of one type. Use with funs().
mutate_if(iris, is.numeric, funs(log(.)))%>%head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1.629241 1.252763 0.3364722 -1.6094379 setosa
## 2 1.589235 1.098612 0.3364722 -1.6094379 setosa
## 3 1.547563 1.163151 0.2623643 -1.6094379 setosa
## 4 1.526056 1.131402 0.4054651 -1.6094379 setosa
## 5 1.609438 1.280934 0.3364722 -1.6094379 setosa
## 6 1.686399 1.360977 0.5306283 -0.9162907 setosa
specific columns. Use with funs(), vars() and the helper functions for select().
# Take log of all columns except Species
mutate_at(iris, vars( -Species), funs(log(.)))%>%head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1.629241 1.252763 0.3364722 -1.6094379 setosa
## 2 1.589235 1.098612 0.3364722 -1.6094379 setosa
## 3 1.547563 1.163151 0.2623643 -1.6094379 setosa
## 4 1.526056 1.131402 0.4054651 -1.6094379 setosa
## 5 1.609438 1.280934 0.3364722 -1.6094379 setosa
## 6 1.686399 1.360977 0.5306283 -0.9162907 setosa
Add new column(s).
add_column(mtcars, new = 1:32)%>%head(4)
## mpg cyl disp hp drat wt qsec vs am gear carb new
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 4
filter(iris, Sepal.Length > 7)%>%head(4)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 7.1 3.0 5.9 2.1 virginica
## 2 7.6 3.0 6.6 2.1 virginica
## 3 7.3 2.9 6.3 1.8 virginica
## 4 7.2 3.6 6.1 2.5 virginica
distinct(iris, Species)%>%head()
## Species
## 1 setosa
## 2 versicolor
## 3 virginica
Randomly select fraction of rows.
sample_frac(iris, 0.5, replace = TRUE)%>%head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 19 5.7 3.8 1.7 0.3 setosa
## 56 5.7 2.8 4.5 1.3 versicolor
## 49 5.3 3.7 1.5 0.2 setosa
## 133 6.4 2.8 5.6 2.2 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 62 5.9 3.0 4.2 1.5 versicolor
Randomly select size rows.
sample_n(iris, 10, replace = TRUE)%>%head(4)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 53 6.9 3.1 4.9 1.5 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 44 5.0 3.5 1.6 0.6 setosa
## 58 4.9 2.4 3.3 1.0 versicolor
dplyr::slice(iris, 10:15)
## # A tibble: 6 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 4.90 3.10 1.50 0.100 setosa
## 2 5.40 3.70 1.50 0.200 setosa
## 3 4.80 3.40 1.60 0.200 setosa
## 4 4.80 3.00 1.40 0.100 setosa
## 5 4.30 3.00 1.10 0.100 setosa
## 6 5.80 4.00 1.20 0.200 setosa
top_n(iris, 3, Sepal.Width)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.7 4.4 1.5 0.4 setosa
## 2 5.2 4.1 1.5 0.1 setosa
## 3 5.5 4.2 1.4 0.2 setosa
#select the bottom three rows
top_n(iris, -3, Sepal.Width)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.0 2.0 3.5 1.0 versicolor
## 2 6.0 2.2 4.0 1.0 versicolor
## 3 6.2 2.2 4.5 1.5 versicolor
## 4 6.0 2.2 5.0 1.5 virginica
arrange(mtcars, mpg)%>%head(4)
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## 3 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## 4 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
arrange(mtcars, desc(mpg))%>%head(4)
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 2 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 3 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 4 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Add one or more rows to a table.
add_row(faithful, eruptions = 1, waiting = 1)%>%tail(3)
## eruptions waiting
## 271 1.817 46
## 272 4.467 74
## 273 1.000 1
Compute table of summaries. Also summarise_().
summarise(mtcars, avg = mean(mpg))
## avg
## 1 20.09062
Count number of rows in each group defined by the variables in … Also tally().
count(iris, Species)
## # A tibble: 3 x 2
## Species n
## <fct> <int>
## 1 setosa 50
## 2 versicolor 50
## 3 virginica 50
#dplyr::tally(iris, Species)
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
## # A tibble: 3 x 2
## cyl avg
## <dbl> <dbl>
## 1 4.00 26.7
## 2 6.00 19.7
## 3 8.00 15.1
Returns ungrouped copy of table.
g_iris=mtcars[1:10,] %>%
group_by(cyl)
ungroup(g_iris)%>%head(4)
## # A tibble: 4 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6.00 160 110 3.90 2.62 16.5 0 1.00 4.00 4.00
## 2 21.0 6.00 160 110 3.90 2.88 17.0 0 1.00 4.00 4.00
## 3 22.8 4.00 108 93.0 3.85 2.32 18.6 1.00 1.00 4.00 1.00
## 4 21.4 6.00 258 110 3.08 3.22 19.4 1.00 0 3.00 1.00
#OFFSETS
set.seed(1)
#x=ceiling(rnorm(10))
x=1:10
diff(x)
## [1] 1 1 1 1 1 1 1 1 1
dplyr::lag(x) # - Offset elements by 1
## [1] NA 1 2 3 4 5 6 7 8 9
dplyr::lead(x) #- Offset elements by -1
## [1] 2 3 4 5 6 7 8 9 10 NA
#CUMULATIVE AGGREGATES
dplyr::cumall(x) # - Cumulative all()
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
dplyr::cumany(x) #- Cumulative any()
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
cummax(x) #- Cumulative max()
## [1] 1 2 3 4 5 6 7 8 9 10
dplyr::cummean(x) #- Cumulative mean()
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
cummin(x) #- Cumulative min()
## [1] 1 1 1 1 1 1 1 1 1 1
cumprod(x) #- Cumulative prod()
## [1] 1 2 6 24 120 720 5040 40320
## [9] 362880 3628800
cumsum(x) #- Cumulative sum()
## [1] 1 3 6 10 15 21 28 36 45 55
x=c(1:3,3,2,4)
x1=1:5
dplyr::cume_dist(x1) #- Proportion of all values <=
## [1] 0.2 0.4 0.6 0.8 1.0
dplyr::dense_rank(x) #- rank with ties = min, no gaps
## [1] 1 2 3 3 2 4
dplyr::min_rank(x) #- rank with ties = min
## [1] 1 2 4 4 2 6
dplyr::ntile(x,n=5) #- bins into n bins
## [1] 1 1 3 4 2 5
dplyr::percent_rank(x1) #- min_rank scaled to [0,1]
## [1] 0.00 0.25 0.50 0.75 1.00
dplyr::row_number(x) #- rank with ties = "first"
## [1] 1 2 4 5 3 6
x <- rnorm(10)
x[between(x, -1, 1)]
## [1] -0.6264538 0.1836433 -0.8356286 0.3295078 -0.8204684 0.4874291
## [7] 0.7383247 0.5757814 -0.3053884
between(x, -1, 1)
## [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
#dplyr::between() #- x >= left & x <= right
# dplyr::case_when() - multi-case if_else()
# dplyr::coalesce() - first non-NA values by element across a set of vectors
# dplyr::if_else() - element-wise if() + else()
# dplyr::na_if() - replace specific values with NA
# pmax() - element-wise max()
# pmin() - element-wise min()
# dplyr::recode() - Vectorized switch()
# dplyr::recode_factor() #- Vectorized switch() for factors
#COUNTS
#carriers <- group_by(flights, carrier)
#summarise(carriers, n())
iris%>%group_by(Species) %>%dplyr::summarise(n()) #- number of values/rows
## # A tibble: 3 x 2
## Species `n()`
## <fct> <int>
## 1 setosa 50
## 2 versicolor 50
## 3 virginica 50
dplyr::n_distinct(iris$Species) # - # of uniques
## [1] 3
sum(!is.na(iris)) #- # of non-NA’s
## [1] 750
length(unique(iris$Species))
## [1] 3
iris%>%group_by(Species) %>%dplyr::count() #- number of values/rows
## # A tibble: 3 x 2
## # Groups: Species [3]
## Species n
## <fct> <int>
## 1 setosa 50
## 2 versicolor 50
## 3 virginica 50
Move row names into col.
a <- rownames_to_column(iris, var
= "row_number")
a%>%head()
## row_number Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1 5.1 3.5 1.4 0.2 setosa
## 2 2 4.9 3.0 1.4 0.2 setosa
## 3 3 4.7 3.2 1.3 0.2 setosa
## 4 4 4.6 3.1 1.5 0.2 setosa
## 5 5 5.0 3.6 1.4 0.2 setosa
## 6 6 5.4 3.9 1.7 0.4 setosa
Move col in row names.
column_to_rownames(a, var = "row_number")%>%head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
remove_rownames(a)
## row_number Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 1 5.1 3.5 1.4 0.2
## 2 2 4.9 3.0 1.4 0.2
## 3 3 4.7 3.2 1.3 0.2
## 4 4 4.6 3.1 1.5 0.2
## 5 5 5.0 3.6 1.4 0.2
## 6 6 5.4 3.9 1.7 0.4
## 7 7 4.6 3.4 1.4 0.3
## 8 8 5.0 3.4 1.5 0.2
## 9 9 4.4 2.9 1.4 0.2
## 10 10 4.9 3.1 1.5 0.1
## 11 11 5.4 3.7 1.5 0.2
## 12 12 4.8 3.4 1.6 0.2
## 13 13 4.8 3.0 1.4 0.1
## 14 14 4.3 3.0 1.1 0.1
## 15 15 5.8 4.0 1.2 0.2
## 16 16 5.7 4.4 1.5 0.4
## 17 17 5.4 3.9 1.3 0.4
## 18 18 5.1 3.5 1.4 0.3
## 19 19 5.7 3.8 1.7 0.3
## 20 20 5.1 3.8 1.5 0.3
## 21 21 5.4 3.4 1.7 0.2
## 22 22 5.1 3.7 1.5 0.4
## 23 23 4.6 3.6 1.0 0.2
## 24 24 5.1 3.3 1.7 0.5
## 25 25 4.8 3.4 1.9 0.2
## 26 26 5.0 3.0 1.6 0.2
## 27 27 5.0 3.4 1.6 0.4
## 28 28 5.2 3.5 1.5 0.2
## 29 29 5.2 3.4 1.4 0.2
## 30 30 4.7 3.2 1.6 0.2
## 31 31 4.8 3.1 1.6 0.2
## 32 32 5.4 3.4 1.5 0.4
## 33 33 5.2 4.1 1.5 0.1
## 34 34 5.5 4.2 1.4 0.2
## 35 35 4.9 3.1 1.5 0.2
## 36 36 5.0 3.2 1.2 0.2
## 37 37 5.5 3.5 1.3 0.2
## 38 38 4.9 3.6 1.4 0.1
## 39 39 4.4 3.0 1.3 0.2
## 40 40 5.1 3.4 1.5 0.2
## 41 41 5.0 3.5 1.3 0.3
## 42 42 4.5 2.3 1.3 0.3
## 43 43 4.4 3.2 1.3 0.2
## 44 44 5.0 3.5 1.6 0.6
## 45 45 5.1 3.8 1.9 0.4
## 46 46 4.8 3.0 1.4 0.3
## 47 47 5.1 3.8 1.6 0.2
## 48 48 4.6 3.2 1.4 0.2
## 49 49 5.3 3.7 1.5 0.2
## 50 50 5.0 3.3 1.4 0.2
## 51 51 7.0 3.2 4.7 1.4
## 52 52 6.4 3.2 4.5 1.5
## 53 53 6.9 3.1 4.9 1.5
## 54 54 5.5 2.3 4.0 1.3
## 55 55 6.5 2.8 4.6 1.5
## 56 56 5.7 2.8 4.5 1.3
## 57 57 6.3 3.3 4.7 1.6
## 58 58 4.9 2.4 3.3 1.0
## 59 59 6.6 2.9 4.6 1.3
## 60 60 5.2 2.7 3.9 1.4
## 61 61 5.0 2.0 3.5 1.0
## 62 62 5.9 3.0 4.2 1.5
## 63 63 6.0 2.2 4.0 1.0
## 64 64 6.1 2.9 4.7 1.4
## 65 65 5.6 2.9 3.6 1.3
## 66 66 6.7 3.1 4.4 1.4
## 67 67 5.6 3.0 4.5 1.5
## 68 68 5.8 2.7 4.1 1.0
## 69 69 6.2 2.2 4.5 1.5
## 70 70 5.6 2.5 3.9 1.1
## 71 71 5.9 3.2 4.8 1.8
## 72 72 6.1 2.8 4.0 1.3
## 73 73 6.3 2.5 4.9 1.5
## 74 74 6.1 2.8 4.7 1.2
## 75 75 6.4 2.9 4.3 1.3
## 76 76 6.6 3.0 4.4 1.4
## 77 77 6.8 2.8 4.8 1.4
## 78 78 6.7 3.0 5.0 1.7
## 79 79 6.0 2.9 4.5 1.5
## 80 80 5.7 2.6 3.5 1.0
## 81 81 5.5 2.4 3.8 1.1
## 82 82 5.5 2.4 3.7 1.0
## 83 83 5.8 2.7 3.9 1.2
## 84 84 6.0 2.7 5.1 1.6
## 85 85 5.4 3.0 4.5 1.5
## 86 86 6.0 3.4 4.5 1.6
## 87 87 6.7 3.1 4.7 1.5
## 88 88 6.3 2.3 4.4 1.3
## 89 89 5.6 3.0 4.1 1.3
## 90 90 5.5 2.5 4.0 1.3
## 91 91 5.5 2.6 4.4 1.2
## 92 92 6.1 3.0 4.6 1.4
## 93 93 5.8 2.6 4.0 1.2
## 94 94 5.0 2.3 3.3 1.0
## 95 95 5.6 2.7 4.2 1.3
## 96 96 5.7 3.0 4.2 1.2
## 97 97 5.7 2.9 4.2 1.3
## 98 98 6.2 2.9 4.3 1.3
## 99 99 5.1 2.5 3.0 1.1
## 100 100 5.7 2.8 4.1 1.3
## 101 101 6.3 3.3 6.0 2.5
## 102 102 5.8 2.7 5.1 1.9
## 103 103 7.1 3.0 5.9 2.1
## 104 104 6.3 2.9 5.6 1.8
## 105 105 6.5 3.0 5.8 2.2
## 106 106 7.6 3.0 6.6 2.1
## 107 107 4.9 2.5 4.5 1.7
## 108 108 7.3 2.9 6.3 1.8
## 109 109 6.7 2.5 5.8 1.8
## 110 110 7.2 3.6 6.1 2.5
## 111 111 6.5 3.2 5.1 2.0
## 112 112 6.4 2.7 5.3 1.9
## 113 113 6.8 3.0 5.5 2.1
## 114 114 5.7 2.5 5.0 2.0
## 115 115 5.8 2.8 5.1 2.4
## 116 116 6.4 3.2 5.3 2.3
## 117 117 6.5 3.0 5.5 1.8
## 118 118 7.7 3.8 6.7 2.2
## 119 119 7.7 2.6 6.9 2.3
## 120 120 6.0 2.2 5.0 1.5
## 121 121 6.9 3.2 5.7 2.3
## 122 122 5.6 2.8 4.9 2.0
## 123 123 7.7 2.8 6.7 2.0
## 124 124 6.3 2.7 4.9 1.8
## 125 125 6.7 3.3 5.7 2.1
## 126 126 7.2 3.2 6.0 1.8
## 127 127 6.2 2.8 4.8 1.8
## 128 128 6.1 3.0 4.9 1.8
## 129 129 6.4 2.8 5.6 2.1
## 130 130 7.2 3.0 5.8 1.6
## 131 131 7.4 2.8 6.1 1.9
## 132 132 7.9 3.8 6.4 2.0
## 133 133 6.4 2.8 5.6 2.2
## 134 134 6.3 2.8 5.1 1.5
## 135 135 6.1 2.6 5.6 1.4
## 136 136 7.7 3.0 6.1 2.3
## 137 137 6.3 3.4 5.6 2.4
## 138 138 6.4 3.1 5.5 1.8
## 139 139 6.0 3.0 4.8 1.8
## 140 140 6.9 3.1 5.4 2.1
## 141 141 6.7 3.1 5.6 2.4
## 142 142 6.9 3.1 5.1 2.3
## 143 143 5.8 2.7 5.1 1.9
## 144 144 6.8 3.2 5.9 2.3
## 145 145 6.7 3.3 5.7 2.5
## 146 146 6.7 3.0 5.2 2.3
## 147 147 6.3 2.5 5.0 1.9
## 148 148 6.5 3.0 5.2 2.0
## 149 149 6.2 3.4 5.4 2.3
## 150 150 5.9 3.0 5.1 1.8
## Species
## 1 setosa
## 2 setosa
## 3 setosa
## 4 setosa
## 5 setosa
## 6 setosa
## 7 setosa
## 8 setosa
## 9 setosa
## 10 setosa
## 11 setosa
## 12 setosa
## 13 setosa
## 14 setosa
## 15 setosa
## 16 setosa
## 17 setosa
## 18 setosa
## 19 setosa
## 20 setosa
## 21 setosa
## 22 setosa
## 23 setosa
## 24 setosa
## 25 setosa
## 26 setosa
## 27 setosa
## 28 setosa
## 29 setosa
## 30 setosa
## 31 setosa
## 32 setosa
## 33 setosa
## 34 setosa
## 35 setosa
## 36 setosa
## 37 setosa
## 38 setosa
## 39 setosa
## 40 setosa
## 41 setosa
## 42 setosa
## 43 setosa
## 44 setosa
## 45 setosa
## 46 setosa
## 47 setosa
## 48 setosa
## 49 setosa
## 50 setosa
## 51 versicolor
## 52 versicolor
## 53 versicolor
## 54 versicolor
## 55 versicolor
## 56 versicolor
## 57 versicolor
## 58 versicolor
## 59 versicolor
## 60 versicolor
## 61 versicolor
## 62 versicolor
## 63 versicolor
## 64 versicolor
## 65 versicolor
## 66 versicolor
## 67 versicolor
## 68 versicolor
## 69 versicolor
## 70 versicolor
## 71 versicolor
## 72 versicolor
## 73 versicolor
## 74 versicolor
## 75 versicolor
## 76 versicolor
## 77 versicolor
## 78 versicolor
## 79 versicolor
## 80 versicolor
## 81 versicolor
## 82 versicolor
## 83 versicolor
## 84 versicolor
## 85 versicolor
## 86 versicolor
## 87 versicolor
## 88 versicolor
## 89 versicolor
## 90 versicolor
## 91 versicolor
## 92 versicolor
## 93 versicolor
## 94 versicolor
## 95 versicolor
## 96 versicolor
## 97 versicolor
## 98 versicolor
## 99 versicolor
## 100 versicolor
## 101 virginica
## 102 virginica
## 103 virginica
## 104 virginica
## 105 virginica
## 106 virginica
## 107 virginica
## 108 virginica
## 109 virginica
## 110 virginica
## 111 virginica
## 112 virginica
## 113 virginica
## 114 virginica
## 115 virginica
## 116 virginica
## 117 virginica
## 118 virginica
## 119 virginica
## 120 virginica
## 121 virginica
## 122 virginica
## 123 virginica
## 124 virginica
## 125 virginica
## 126 virginica
## 127 virginica
## 128 virginica
## 129 virginica
## 130 virginica
## 131 virginica
## 132 virginica
## 133 virginica
## 134 virginica
## 135 virginica
## 136 virginica
## 137 virginica
## 138 virginica
## 139 virginica
## 140 virginica
## 141 virginica
## 142 virginica
## 143 virginica
## 144 virginica
## 145 virginica
## 146 virginica
## 147 virginica
## 148 virginica
## 149 virginica
## 150 virginica
x=data_frame(A=c("a","b","c"),B=c("t","u","v"),c=c(1:3))
y=data_frame(A=c("a","b","d"),B=c("t","u","w"),D=c(3,2,1))
x
## # A tibble: 3 x 3
## A B c
## <chr> <chr> <int>
## 1 a t 1
## 2 b u 2
## 3 c v 3
y
## # A tibble: 3 x 3
## A B D
## <chr> <chr> <dbl>
## 1 a t 3.00
## 2 b u 2.00
## 3 d w 1.00
bind_cols(x,y)
## # A tibble: 3 x 6
## A B c A1 B1 D
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 a t 1 a t 3.00
## 2 b u 2 b u 2.00
## 3 c v 3 d w 1.00
Join matching values from y to x.
left_join(x, y, by = NULL)
## # A tibble: 3 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 3.00
## 2 b u 2 2.00
## 3 c v 3 NA
FALSE, suffix=c(“.x”,“.y”),…) Join matching values from x to y.
right_join(x, y, by = NULL)
## # A tibble: 3 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 3.00
## 2 b u 2 2.00
## 3 d w NA 1.00
Join data. Retain only rows with matches.
inner_join(x, y, by = NULL)
## # A tibble: 2 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 3.00
## 2 b u 2 2.00
merge(x,y)
## A B c D
## 1 a t 1 3
## 2 b u 2 2
Join data. Retain all values, all rows.
full_join(x, y)
## # A tibble: 4 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 3.00
## 2 b u 2 2.00
## 3 c v 3 NA
## 4 d w NA 1.00
#union(x,y)
union_all(x,y)
## # A tibble: 6 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 NA
## 2 b u 2 NA
## 3 c v 3 NA
## 4 a t NA 3.00
## 5 b u NA 2.00
## 6 d w NA 1.00
specify the column(s) to match on. left_join(x, y, by = “A”)
left_join(x, y, by = "A")
## # A tibble: 3 x 5
## A B.x c B.y D
## <chr> <chr> <int> <chr> <dbl>
## 1 a t 1 t 3.00
## 2 b u 2 u 2.00
## 3 c v 3 <NA> NA
to match on columns with different names in each data set. left_join(x, y, by = c(“C” = “D”))
left_join(x, y, by = c("c" = "D"))
## # A tibble: 3 x 5
## A.x B.x c A.y B.y
## <chr> <chr> <dbl> <chr> <chr>
## 1 a t 1.00 d w
## 2 b u 2.00 b u
## 3 c v 3.00 a t
duplicate column names. left_join(x, y, by = c(“C” = “D”), suffix = c(“1”, “2”))
left_join(x, y, by = c("c" = "D"), suffix =c("1", "2"))
## # A tibble: 3 x 5
## A1 B1 c A2 B2
## <chr> <chr> <dbl> <chr> <chr>
## 1 a t 1.00 d w
## 2 b u 2.00 b u
## 3 c v 3.00 a t
merge(x,y)
## A B c D
## 1 a t 1 3
## 2 b u 2 2
merge(x,y,by="A")
## A B.x c B.y D
## 1 a t 1 t 3
## 2 b u 2 u 2
other as they are.
bind_rows(x,y)
## # A tibble: 6 x 4
## A B c D
## <chr> <chr> <int> <dbl>
## 1 a t 1 NA
## 2 b u 2 NA
## 3 c v 3 NA
## 4 a t NA 3.00
## 5 b u NA 2.00
## 6 d w NA 1.00
Rows that appear in x but not z.
setdiff(x=c(1,2,3,4,5), y=c(3,4,5,6,7))
## [1] 1 2
Rows that appear in x or z.(Duplicates removed). union_all() retains duplicates.
union(x=c(1,2,3,4,5), y=c(3,4,5,6,7))
## [1] 1 2 3 4 5 6 7
to test whether two data sets contain the exact same rows (in any order)
setequal(x,y)
## FALSE: Cols in y but not x: `D`. Cols in x but not y: `c`.
to filter one table against the rows of another.
Return rows of x that have a match in y.
semi_join(x, y, by = NULL)
## # A tibble: 2 x 3
## A B c
## <chr> <chr> <int>
## 1 a t 1
## 2 b u 2
Return rows of x that do not have a match in y.
anti_join(x, y, by = NULL)
## # A tibble: 1 x 3
## A B c
## <chr> <chr> <int>
## 1 c v 3
Rows that appear in both x and z.
x1=data_frame(A=c("a","b","c"),B=c("t","u","v"),C=c(1:3))
y1=data_frame(A=c("a","b","c"),B=c("t","u","w"),C=c(3,2,1))
intersect(x1, y1)
## # A tibble: 1 x 3
## A B C
## <chr> <chr> <dbl>
## 1 b u 2.00
inner_join(x1,y1,by="A")
## # A tibble: 3 x 5
## A B.x C.x B.y C.y
## <chr> <chr> <int> <chr> <dbl>
## 1 a t 1 t 3.00
## 2 b u 2 u 2.00
## 3 c v 3 w 1.00
full_join(x1,y1)
## # A tibble: 5 x 3
## A B C
## <chr> <chr> <dbl>
## 1 a t 1.00
## 2 b u 2.00
## 3 c v 3.00
## 4 a t 3.00
## 5 c w 1.00
In dplyr (and in tidyeval in general) you use !! to say that you want to unquote an input so that it’s evaluated, not quoted. This gives us a function that actually does what we want.
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
my_summarise <- function(df, group_var) {
df %>%
group_by(!!group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, quo(g1))
## # A tibble: 2 x 2
## g1 a
## <dbl> <dbl>
## 1 1.00 3.00
## 2 2.00 3.00
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
df %>%
group_by(g1) %>%
summarise(a = mean(a))
## # A tibble: 2 x 2
## g1 a
## <dbl> <dbl>
## 1 1.00 4.00
## 2 2.00 2.33
###doesnt work
my_summarise <- function(df, group_var) {
df %>%
group_by(group_var) %>%
summarise(a = mean(a))
}
#my_summarise(df, g1)
quo(g1)
## <quosure: global>
## ~g1
quo(a + b + c)
## <quosure: global>
## ~a + b + c
quo("a")
## <quosure: empty>
## ~"a"
# Error in (function (x, strict = TRUE) : must be called in a function
#enquo(g1)
noquote(paste("~","g1")) #equivalent
## [1] ~ g1
quote(a + b + c)
## a + b + c
quote("a + b + c")
## [1] "a + b + c"
f1 <- function(x, y = x) { x <- x + 1; y }
s1 <- function(x, y = substitute(x)) { x <- x + 1; y }
s2 <- function(x, y) { if(missing(y)) y <- substitute(x); x <- x + 1; y }
a <- 10
f1(a) # 11
## [1] 11
s1(a) # 11
## [1] 11
s2(a) # a
## a
typeof(s2(a)) # "symbol"
## [1] "symbol"
my_summarise <- function(df, group_var) {
df %>%
group_by(!!group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, quo(g1))
## # A tibble: 2 x 2
## g1 a
## <dbl> <dbl>
## 1 1.00 4.00
## 2 2.00 2.33
my_summarise <- function(df, group_by) {
group_by <- enquo(group_by)
print(group_by)
df %>%
group_by(!!group_by) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
## <quosure: global>
## ~g1
## # A tibble: 2 x 2
## g1 a
## <dbl> <dbl>
## 1 1.00 4.00
## 2 2.00 2.33
(If you’re familiar with quote() and substitute() in base R, quo() is equivalent to quote() and enquo() is equivalent to substitute().)
summarise(df, mean = mean(a), sum = sum(a), n = n())
## # A tibble: 1 x 3
## mean sum n
## <dbl> <int> <int>
## 1 3.00 15 5
summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
## # A tibble: 1 x 3
## mean sum n
## <dbl> <int> <int>
## 1 10.6 53 5
To turn this into a function, we start by testing the basic approach interactively: we quote the variable with quo(), then unquoting it in the dplyr call with !!. Notice that we can unquote anywhere inside a complicated expression.
my_var <- quo(a)
summarise(df, mean = mean(!!my_var), sum = sum(!!my_var), n = n())
## # A tibble: 1 x 3
## mean sum n
## <dbl> <int> <int>
## 1 3.00 15 5
quo(summarise(df,
mean = mean(!!my_var),
sum = sum(!!my_var),
n = n()
))
## <quosure: global>
## ~summarise(df, mean = mean(~a), sum = sum(~a), n = n())
quote(summarise(df,
mean = mean(!!my_var),
sum = sum(!!my_var),
n = n()
))
## summarise(df, mean = mean(!(!my_var)), sum = sum(!(!my_var)),
## n = n())
noquote(paste("~",quo(summarise(df,
mean = mean(!!my_var),
sum = sum(!!my_var),
n = n()
))))
## [1] ~ ~
## [2] ~ summarise(df, mean = mean(~a), sum = sum(~a), n = n())
my_summarise2 <- function(df, expr) {
expr <- enquo(expr)
summarise(df,
mean = mean(!!expr),
sum = sum(!!expr),
n = n()
)
}
my_summarise2(df, a)
## # A tibble: 1 x 3
## mean sum n
## <dbl> <int> <int>
## 1 3.00 15 5
mutate(df, mean_a = mean(a), sum_a = sum(a))
## # A tibble: 5 x 6
## g1 g2 a b mean_a sum_a
## <dbl> <dbl> <int> <int> <dbl> <int>
## 1 1.00 1.00 3 4 3.00 15
## 2 1.00 2.00 5 5 3.00 15
## 3 2.00 1.00 2 1 3.00 15
## 4 2.00 2.00 1 2 3.00 15
## 5 2.00 1.00 4 3 3.00 15
mutate(df, mean_b = mean(b), sum_b = sum(b))
## # A tibble: 5 x 6
## g1 g2 a b mean_b sum_b
## <dbl> <dbl> <int> <int> <dbl> <int>
## 1 1.00 1.00 3 4 3.00 15
## 2 1.00 2.00 5 5 3.00 15
## 3 2.00 1.00 2 1 3.00 15
## 4 2.00 2.00 1 2 3.00 15
## 5 2.00 1.00 4 3 3.00 15
my_mutate <- function(df, expr) {
expr <- enquo(expr)
mean_name <- paste0("mean_", quo_name(expr))
sum_name <- paste0("sum_", quo_name(expr))
mutate(df,
!!mean_name := mean(!!expr),
!!sum_name := sum(!!expr)
)
}
my_mutate(df, a)
## # A tibble: 5 x 6
## g1 g2 a b mean_a sum_a
## <dbl> <dbl> <int> <int> <dbl> <int>
## 1 1.00 1.00 3 4 3.00 15
## 2 1.00 2.00 5 5 3.00 15
## 3 2.00 1.00 2 1 3.00 15
## 4 2.00 2.00 1 2 3.00 15
## 5 2.00 1.00 4 3 3.00 15
Extend my_summarise() to accept any number of grouping variables.Make three changes:
Use … in the function definition so our function can accept any number of arguments.
Use quos() to capture all the … as a list of formulas.
Use !!! instead of !! to splice the arguments into group_by().
my_summarise <- function(df, ...) {
group_by <- quos(...)
df %>%
group_by(!!!group_by) %>%
summarise(a = mean(a))
}
my_summarise(df, g1, g2)
## # A tibble: 4 x 3
## # Groups: g1 [?]
## g1 g2 a
## <dbl> <dbl> <dbl>
## 1 1.00 1.00 3.00
## 2 1.00 2.00 5.00
## 3 2.00 1.00 3.00
## 4 2.00 2.00 1.00
!!! takes a list of elements and splices them into to the current call
args <- list(na.rm = TRUE, trim = 0.25)
quo(mean(x, !!! args))
## <quosure: global>
## ~mean(x, na.rm = TRUE, trim = 0.25)
args <- list(quo(x), na.rm = TRUE, trim = 0.25)
quo(mean(!!! args))
## <quosure: global>
## ~mean(~x, na.rm = TRUE, trim = 0.25)
f <- function(x) {
quo(x)
}
x1 <- f(10)
x2 <- f(100)
x1
## <quosure: local>
## ~x
x2
## <quosure: local>
## ~x
The environments using rlang::get_env() — they’re different.
library(rlang)
get_env(x1)
## <environment: 0x7f9924a3fd88>
get_env(x2)
## <environment: 0x7f9924135c60>
evaluating those formulas using rlang::eval_tidy(), we see that they yield different values
eval_tidy(x1)
## [1] 10
eval_tidy(x2)
## [1] 100
user_var <- 1000
mtcars %>% summarise(cyl = mean(cyl) * user_var)
## cyl
## 1 6187.5
quosures
var <- ~toupper(letters[1:5])
var
## ~toupper(letters[1:5])
# You can extract its expression:
get_expr(var)
## toupper(letters[1:5])
# Or inspect its enclosure:
get_env(var)
## <environment: R_GlobalEnv>
# Here we capture `letters[1:5]` as an expression:
quo(toupper(letters[1:5]))
## <quosure: global>
## ~toupper(letters[1:5])
# Here we capture the value of `letters[1:5]`
quo(toupper(!!letters[1:5]))
## <quosure: global>
## ~toupper(c("a", "b", "c", "d", "e"))
quo(toupper(UQ(letters[1:5])))
## <quosure: global>
## ~toupper(c("a", "b", "c", "d", "e"))
quo(!!toupper(letters[1:5]))
## <quosure: empty>
## ~c("A", "B", "C", "D", "E")
my_mutate <- function(x) {
mtcars %>%
select(cyl) %>%
slice(1:4) %>%
mutate(cyl2 = cyl + (!! x))
}
f <- function(x) quo(x)
expr1 <- f(100)
f_1<- function(x) quo(!!x)
expr2 <- f_1(10)
expr1
## <quosure: local>
## ~x
expr2
## <quosure: empty>
## ~10
my_mutate(expr1)
## # A tibble: 4 x 2
## cyl cyl2
## <dbl> <dbl>
## 1 6.00 106
## 2 6.00 106
## 3 4.00 104
## 4 6.00 106
my_mutate(expr2)
## # A tibble: 4 x 2
## cyl cyl2
## <dbl> <dbl>
## 1 6.00 16.0
## 2 6.00 16.0
## 3 4.00 14.0
## 4 6.00 16.0
my_fun <- quo(fun)
#quo(!!my_fun(x, y, z))
#Error in my_fun(x, y, z) : could not find function "my_fun"
quo(UQ(my_fun)(x, y, z))
## <quosure: global>
## ~(~fun)(x, y, z)
my_var <- quo(x)
quo(filter(df, !!my_var == 1))
## <quosure: global>
## ~filter(df, FALSE)
quo(filter(df, UQ(my_var) == 1))
## <quosure: global>
## ~filter(df, (~x) == 1)
UQ() yields a quosure containing a formula. That ensures that when the quosure is evaluated, it’ll be looked up in the right environment. In certain code-generation scenarios you just want to use expression and ignore the environment. That’s the job of UQE().
#rlang::quasiquotation
my_var <- quo(x)
quo(UQE(my_fun)(x, y, z))
## <quosure: global>
## ~fun(x, y, z)
quo(filter(df, UQE(my_var) == 1))
## <quosure: global>
## ~filter(df, x == 1)
UQE(my_var)
## x
#### Unquote-splicing
The second unquote operation is unquote-splicing. Its functional form is UQS() and the syntactic shortcut is !!!. It takes a vector and inserts each element of the vector in the surrounding function call:
quo(list(!!! letters[1:5]))
## <quosure: global>
## ~list("a", "b", "c", "d", "e")
quo(list(!! letters[1:5]))
## <quosure: global>
## ~list(c("a", "b", "c", "d", "e"))
quo(list( UQS(letters[1:5])))
## <quosure: global>
## ~list("a", "b", "c", "d", "e")
x <- list(foo = 1L, bar = quo(baz))
quo(list(!!! x))
## <quosure: global>
## ~list(foo = 1L, bar = ~baz)
args <- list(mean = quo(mean(cyl)), count =quo( n()))
args
## $mean
## <quosure: global>
## ~mean(cyl)
##
## $count
## <quosure: global>
## ~n()
mtcars %>%
group_by(am) %>%
summarise(!!! args)
## # A tibble: 2 x 3
## am mean count
## <dbl> <dbl> <int>
## 1 0 6.95 19
## 2 1.00 5.08 13
The final unquote operation is setting argument names. You’ve seen one way to do that above, but you can also use the definition operator := instead of =. := supports unquoting on both the LHS and the RHS.
The rules on the LHS are slightly different: the unquoted operand should evaluate to a string or a symbol.
mean_nm <- "mean"
count_nm <- "count"
mtcars %>%
group_by(am) %>%
summarise(
!!mean_nm := mean(cyl),
!!count_nm := n()
)
## # A tibble: 2 x 3
## am mean count
## <dbl> <dbl> <int>
## 1 0 6.95 19
## 2 1.00 5.08 13
#tidy_names()
#tidyverse_update()
#tidyverse_conflicts()
#tidyverse_deps()
#install.packages("tidyverse")
tidyverse_logo()
## ⬢ __ _ __ . ⬡ ⬢ .
## / /_(_)__/ /_ ___ _____ _______ ___
## / __/ / _ / // / |/ / -_) __(_-</ -_)
## \__/_/\_,_/\_, /|___/\__/_/ /___/\__/
## ⬢ . /___/ ⬡ . ⬢