Dplyr Tutorial 2

Loading Packages
Manipulate Variables
programming with dplyr,rlang

Loading Packages

Let’s look at some very useful but lesser known dplyr functions. unite function collapse two separate column into one column whereas separate breaks one column into two new ones. The function is by defined as unite(data, col, …, sep = “_“, remove = TRUE)

library(tidyverse)


mtcars%>%head()

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

unite_(mtcars, "vs_am", c("vs","am"))%>%head()

##                    mpg cyl disp  hp drat    wt  qsec vs_am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46   0_1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02   0_1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61   1_1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44   1_0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02   0_0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22   1_0    3    1

unite_(mtcars, "vs:am", c("vs","am"),sep=":")%>%head()

##                    mpg cyl disp  hp drat    wt  qsec vs:am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46   0:1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02   0:1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61   1:1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44   1:0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02   0:0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22   1:0    3    1

# Separate is the complement of unite
mtcars %>%
  unite(vs_am, vs, am) %>%
  separate(vs_am, c("vs", "am"))%>%head()

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Manipulate Variables

select(.data, …)

Extract columns by name. Also select_if()

select_all(mtcars, funs(toupper(.)))%>%head()

##                    MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# collect names of all categorical variables
vars <-function(x) names(x)[grepl('Sepal', names(x))]

iris[,vars(iris)]%>%head()

##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 3          4.7         3.2
## 4          4.6         3.1
## 5          5.0         3.6
## 6          5.4         3.9

select_if(iris, is.factor, toupper)%>%head()

##   SPECIES
## 1  setosa
## 2  setosa
## 3  setosa
## 4  setosa
## 5  setosa
## 6  setosa

#select_if(iris, vars, toupper)
# Selection drops unselected variables:
is_whole <- function(x) all(floor(x) == x)
select_if(mtcars, is_whole, toupper)%>%head()

##                   CYL  HP VS AM GEAR CARB
## Mazda RX4           6 110  0  1    4    4
## Mazda RX4 Wag       6 110  0  1    4    4
## Datsun 710          4  93  1  1    4    1
## Hornet 4 Drive      6 110  1  0    3    1
## Hornet Sportabout   8 175  0  0    3    2
## Valiant             6 105  1  0    3    1

# But renaming retains them:
rename_if(mtcars, is_whole, toupper)%>%head()

##                    mpg CYL disp  HP drat    wt  qsec VS AM GEAR CARB
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# The renaming function is optional for selection:
select_if(mtcars, is_whole)%>%head()

##                   cyl  hp vs am gear carb
## Mazda RX4           6 110  0  1    4    4
## Mazda RX4 Wag       6 110  0  1    4    4
## Datsun 710          4  93  1  1    4    1
## Hornet 4 Drive      6 110  1  0    3    1
## Hornet Sportabout   8 175  0  0    3    2
## Valiant             6 105  1  0    3    1

starts_with(" “)

#Use these helpers with select (),
select(iris, starts_with("Sepal"))%>%head()

##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 3          4.7         3.2
## 4          4.6         3.1
## 5          5.0         3.6
## 6          5.4         3.9

select(iris, ends_with("Width"))%>%head()

##   Sepal.Width Petal.Width
## 1         3.5         0.2
## 2         3.0         0.2
## 3         3.2         0.2
## 4         3.1         0.2
## 5         3.6         0.2
## 6         3.9         0.4

select helpers

nms <- iris%>%head(3)

select(nms, contains("Sepal"))

##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 3          4.7         3.2

select(iris, contains("Petal"))

##     Petal.Length Petal.Width
## 1            1.4         0.2
## 2            1.4         0.2
## 3            1.3         0.2
## 4            1.5         0.2
## 5            1.4         0.2
## 6            1.7         0.4
## 7            1.4         0.3
## 8            1.5         0.2
## 9            1.4         0.2
## 10           1.5         0.1
## 11           1.5         0.2
## 12           1.6         0.2
## 13           1.4         0.1
## 14           1.1         0.1
## 15           1.2         0.2
## 16           1.5         0.4
## 17           1.3         0.4
## 18           1.4         0.3
## 19           1.7         0.3
## 20           1.5         0.3
## 21           1.7         0.2
## 22           1.5         0.4
## 23           1.0         0.2
## 24           1.7         0.5
## 25           1.9         0.2
## 26           1.6         0.2
## 27           1.6         0.4
## 28           1.5         0.2
## 29           1.4         0.2
## 30           1.6         0.2
## 31           1.6         0.2
## 32           1.5         0.4
## 33           1.5         0.1
## 34           1.4         0.2
## 35           1.5         0.2
## 36           1.2         0.2
## 37           1.3         0.2
## 38           1.4         0.1
## 39           1.3         0.2
## 40           1.5         0.2
## 41           1.3         0.3
## 42           1.3         0.3
## 43           1.3         0.2
## 44           1.6         0.6
## 45           1.9         0.4
## 46           1.4         0.3
## 47           1.6         0.2
## 48           1.4         0.2
## 49           1.5         0.2
## 50           1.4         0.2
## 51           4.7         1.4
## 52           4.5         1.5
## 53           4.9         1.5
## 54           4.0         1.3
## 55           4.6         1.5
## 56           4.5         1.3
## 57           4.7         1.6
## 58           3.3         1.0
## 59           4.6         1.3
## 60           3.9         1.4
## 61           3.5         1.0
## 62           4.2         1.5
## 63           4.0         1.0
## 64           4.7         1.4
## 65           3.6         1.3
## 66           4.4         1.4
## 67           4.5         1.5
## 68           4.1         1.0
## 69           4.5         1.5
## 70           3.9         1.1
## 71           4.8         1.8
## 72           4.0         1.3
## 73           4.9         1.5
## 74           4.7         1.2
## 75           4.3         1.3
## 76           4.4         1.4
## 77           4.8         1.4
## 78           5.0         1.7
## 79           4.5         1.5
## 80           3.5         1.0
## 81           3.8         1.1
## 82           3.7         1.0
## 83           3.9         1.2
## 84           5.1         1.6
## 85           4.5         1.5
## 86           4.5         1.6
## 87           4.7         1.5
## 88           4.4         1.3
## 89           4.1         1.3
## 90           4.0         1.3
## 91           4.4         1.2
## 92           4.6         1.4
## 93           4.0         1.2
## 94           3.3         1.0
## 95           4.2         1.3
## 96           4.2         1.2
## 97           4.2         1.3
## 98           4.3         1.3
## 99           3.0         1.1
## 100          4.1         1.3
## 101          6.0         2.5
## 102          5.1         1.9
## 103          5.9         2.1
## 104          5.6         1.8
## 105          5.8         2.2
## 106          6.6         2.1
## 107          4.5         1.7
## 108          6.3         1.8
## 109          5.8         1.8
## 110          6.1         2.5
## 111          5.1         2.0
## 112          5.3         1.9
## 113          5.5         2.1
## 114          5.0         2.0
## 115          5.1         2.4
## 116          5.3         2.3
## 117          5.5         1.8
## 118          6.7         2.2
## 119          6.9         2.3
## 120          5.0         1.5
## 121          5.7         2.3
## 122          4.9         2.0
## 123          6.7         2.0
## 124          4.9         1.8
## 125          5.7         2.1
## 126          6.0         1.8
## 127          4.8         1.8
## 128          4.9         1.8
## 129          5.6         2.1
## 130          5.8         1.6
## 131          6.1         1.9
## 132          6.4         2.0
## 133          5.6         2.2
## 134          5.1         1.5
## 135          5.6         1.4
## 136          6.1         2.3
## 137          5.6         2.4
## 138          5.5         1.8
## 139          4.8         1.8
## 140          5.4         2.1
## 141          5.6         2.4
## 142          5.1         2.3
## 143          5.1         1.9
## 144          5.9         2.3
## 145          5.7         2.5
## 146          5.2         2.3
## 147          5.0         1.9
## 148          5.2         2.0
## 149          5.4         2.3
## 150          5.1         1.8

select(nms, starts_with("Petal"))

##   Petal.Length Petal.Width
## 1          1.4         0.2
## 2          1.4         0.2
## 3          1.3         0.2

select(nms, ends_with("Width"))

##   Sepal.Width Petal.Width
## 1         3.5         0.2
## 2         3.0         0.2
## 3         3.2         0.2

select(nms, contains("etal"))

##   Petal.Length Petal.Width
## 1          1.4         0.2
## 2          1.4         0.2
## 3          1.3         0.2

select(nms, matches(".t."))

##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          4.7         3.2          1.3         0.2

select(nms, Petal.Length, Petal.Width)

##   Petal.Length Petal.Width
## 1          1.4         0.2
## 2          1.4         0.2
## 3          1.3         0.2

select(nms, everything())

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa

vars <- c("Petal.Length", "Petal.Width")
select(nms, one_of(vars))

##   Petal.Length Petal.Width
## 1          1.4         0.2
## 2          1.4         0.2
## 3          1.3         0.2

library(tidyselect)

vars_select(names(iris), last_col())

##   Species 
## "Species"

vars_select(names(iris), last_col(offset = 2))

##   Petal.Length 
## "Petal.Length"

# Keep variables
vars_select(names(iris), everything())

##   Sepal.Length    Sepal.Width   Petal.Length    Petal.Width        Species 
## "Sepal.Length"  "Sepal.Width" "Petal.Length"  "Petal.Width"      "Species"

vars_select(names(iris), starts_with("Petal"))

##   Petal.Length    Petal.Width 
## "Petal.Length"  "Petal.Width"

vars_select(names(iris), ends_with("Width"))

##   Sepal.Width   Petal.Width 
## "Sepal.Width" "Petal.Width"

vars_select(names(iris), contains("etal"))

##   Petal.Length    Petal.Width 
## "Petal.Length"  "Petal.Width"

vars_select(names(iris), matches(".t."))

##   Sepal.Length    Sepal.Width   Petal.Length    Petal.Width 
## "Sepal.Length"  "Sepal.Width" "Petal.Length"  "Petal.Width"

vars_select(names(iris), Petal.Length, Petal.Width)

##   Petal.Length    Petal.Width 
## "Petal.Length"  "Petal.Width"

vars_select(names(iris), one_of("Petal.Length", "Petal.Width"))

##   Petal.Length    Petal.Width 
## "Petal.Length"  "Petal.Width"

# Drop variables
vars_select(names(iris), -starts_with("Petal"))

##   Sepal.Length    Sepal.Width        Species 
## "Sepal.Length"  "Sepal.Width"      "Species"

vars_select(names(iris), -ends_with("Width"))

##   Sepal.Length   Petal.Length        Species 
## "Sepal.Length" "Petal.Length"      "Species"

vars_select(names(iris), -contains("etal"))

##   Sepal.Length    Sepal.Width        Species 
## "Sepal.Length"  "Sepal.Width"      "Species"

vars_select(names(iris), -matches(".t."))

##   Species 
## "Species"

vars_select(names(iris), -Petal.Length, -Petal.Width)

##   Sepal.Length    Sepal.Width        Species 
## "Sepal.Length"  "Sepal.Width"      "Species"

# Rename variables
vars_select(names(iris), petal_length = Petal.Length)

##   petal_length 
## "Petal.Length"

vars_select(names(iris), petal = starts_with("Petal"))

##         petal1         petal2 
## "Petal.Length"  "Petal.Width"

# Rename variables preserving all existing
vars_rename(names(iris), petal_length = Petal.Length)

##   Sepal.Length    Sepal.Width   petal_length    Petal.Width        Species 
## "Sepal.Length"  "Sepal.Width" "Petal.Length"  "Petal.Width"      "Species"

# You can unquote symbols or quosures
vars_select(names(iris), !! quote(Petal.Length))

##   Petal.Length 
## "Petal.Length"

# And unquote-splice lists of symbols or quosures
vars_select(names(iris), !!! list(quo(Petal.Length), quote(Petal.Width)))

##   Petal.Length    Petal.Width 
## "Petal.Length"  "Petal.Width"

mutate(.data, …)

Compute new column(s).

mutate(mtcars, gpm = 1/mpg)%>%head()

##    mpg cyl disp  hp drat    wt  qsec vs am gear carb        gpm
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 0.04761905
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 0.04761905
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 0.04385965
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 0.04672897
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 0.05347594
## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 0.05524862

transmute(.data, …)

Compute new column(s), drop others.

transmute(mtcars, gpm = 1/mpg)%>%head()

##          gpm
## 1 0.04761905
## 2 0.04761905
## 3 0.04385965
## 4 0.04672897
## 5 0.05347594
## 6 0.05524862

mutate_all(.tbl, .funs, …) Apply funs to every

column. Use with funs().

f=faithful%>%head(4)

mutate_all(f, funs(log(.), log2(.)))

##   eruptions waiting eruptions_log waiting_log eruptions_log2 waiting_log2
## 1     3.600      79     1.2809338    4.369448      1.8479969     6.303781
## 2     1.800      54     0.5877867    3.988984      0.8479969     5.754888
## 3     3.333      74     1.2038728    4.304065      1.7368213     6.209453
## 4     2.283      62     0.8254904    4.127134      1.1909309     5.954196

mutate_if(.tbl, .predicate, .funs, …)

Apply funs to all columns of one type. Use with funs().

mutate_if(iris, is.numeric, funs(log(.)))%>%head()

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1     1.629241    1.252763    0.3364722  -1.6094379  setosa
## 2     1.589235    1.098612    0.3364722  -1.6094379  setosa
## 3     1.547563    1.163151    0.2623643  -1.6094379  setosa
## 4     1.526056    1.131402    0.4054651  -1.6094379  setosa
## 5     1.609438    1.280934    0.3364722  -1.6094379  setosa
## 6     1.686399    1.360977    0.5306283  -0.9162907  setosa

mutate_at(.tbl, .cols, .funs, …) Apply funs to

specific columns. Use with funs(), vars() and the helper functions for select().

# Take log of all columns except Species
mutate_at(iris, vars( -Species), funs(log(.)))%>%head()

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1     1.629241    1.252763    0.3364722  -1.6094379  setosa
## 2     1.589235    1.098612    0.3364722  -1.6094379  setosa
## 3     1.547563    1.163151    0.2623643  -1.6094379  setosa
## 4     1.526056    1.131402    0.4054651  -1.6094379  setosa
## 5     1.609438    1.280934    0.3364722  -1.6094379  setosa
## 6     1.686399    1.360977    0.5306283  -0.9162907  setosa

add_column(.data, …, .before = NULL, .after =NULL)

Add new column(s).

add_column(mtcars, new = 1:32)%>%head(4)

##                 mpg cyl disp  hp drat    wt  qsec vs am gear carb new
## Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   1
## Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   2
## Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   3
## Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   4

Manipulate Cases

filter(.data, …) Extract rows that meet logical criteria. Also filter_().

filter(iris, Sepal.Length > 7)%>%head(4)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 1          7.1         3.0          5.9         2.1 virginica
## 2          7.6         3.0          6.6         2.1 virginica
## 3          7.3         2.9          6.3         1.8 virginica
## 4          7.2         3.6          6.1         2.5 virginica

distinct(.data, …, .keep_all = FALSE) Remove rows with duplicate values. Also distinct_().

distinct(iris, Species)%>%head()

##      Species
## 1     setosa
## 2 versicolor
## 3  virginica

sample_frac(tbl, size = 1, replace = FALSE,weight = NULL, .env = parent.frame())

Randomly select fraction of rows.

sample_frac(iris, 0.5, replace = TRUE)%>%head()

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 19           5.7         3.8          1.7         0.3     setosa
## 56           5.7         2.8          4.5         1.3 versicolor
## 49           5.3         3.7          1.5         0.2     setosa
## 133          6.4         2.8          5.6         2.2  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 62           5.9         3.0          4.2         1.5 versicolor

sample_n(tbl, size, replace = FALSE, weight =NULL, .env = parent.frame())

Randomly select size rows.

sample_n(iris, 10, replace = TRUE)%>%head(4)

##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 53          6.9         3.1          4.9         1.5 versicolor
## 82          5.5         2.4          3.7         1.0 versicolor
## 44          5.0         3.5          1.6         0.6     setosa
## 58          4.9         2.4          3.3         1.0 versicolor

slice(.data, …) Select rows by position. Also slice_().

dplyr::slice(iris, 10:15)

## # A tibble: 6 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
## 1         4.90        3.10         1.50       0.100 setosa 
## 2         5.40        3.70         1.50       0.200 setosa 
## 3         4.80        3.40         1.60       0.200 setosa 
## 4         4.80        3.00         1.40       0.100 setosa 
## 5         4.30        3.00         1.10       0.100 setosa 
## 6         5.80        4.00         1.20       0.200 setosa

top_n(x, n, wt) Select and order top n entries (bygroup if grouped data).

top_n(iris, 3, Sepal.Width)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.7         4.4          1.5         0.4  setosa
## 2          5.2         4.1          1.5         0.1  setosa
## 3          5.5         4.2          1.4         0.2  setosa

#select the bottom three rows
top_n(iris, -3, Sepal.Width)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.0         2.0          3.5         1.0 versicolor
## 2          6.0         2.2          4.0         1.0 versicolor
## 3          6.2         2.2          4.5         1.5 versicolor
## 4          6.0         2.2          5.0         1.5  virginica

arrange(.data, …) Order rows by values of a column (low to high), use with desc() to order from high to low.

arrange(mtcars, mpg)%>%head(4)

##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## 2 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## 3 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## 4 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4

arrange(mtcars, desc(mpg))%>%head(4)

##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
## 2 32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
## 3 30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
## 4 30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

add_row(.data, …, .before = NULL, .after = NULL)

Add one or more rows to a table.

add_row(faithful, eruptions = 1, waiting = 1)%>%tail(3)

##     eruptions waiting
## 271     1.817      46
## 272     4.467      74
## 273     1.000       1

Summarise Cases

summarise(.data, …)

Compute table of summaries. Also summarise_().

summarise(mtcars, avg = mean(mpg))

##        avg
## 1 20.09062

count(x, …, wt = NULL, sort = FALSE)

Count number of rows in each group defined by the variables in … Also tally().

count(iris, Species)

## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

#dplyr::tally(iris, Species)

Group Cases

mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))

## # A tibble: 3 x 2
##     cyl   avg
##   <dbl> <dbl>
## 1  4.00  26.7
## 2  6.00  19.7
## 3  8.00  15.1

ungroup(x, …)

Returns ungrouped copy of table.

g_iris=mtcars[1:10,] %>%
group_by(cyl)

ungroup(g_iris)%>%head(4)

## # A tibble: 4 x 11
##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00
## 2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00
## 3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00
## 4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00

Vectorized Functions

#OFFSETS
set.seed(1)
#x=ceiling(rnorm(10))
x=1:10
diff(x)

## [1] 1 1 1 1 1 1 1 1 1

dplyr::lag(x)  # - Offset elements by 1

##  [1] NA  1  2  3  4  5  6  7  8  9

dplyr::lead(x)   #- Offset elements by -1

##  [1]  2  3  4  5  6  7  8  9 10 NA

#CUMULATIVE AGGREGATES

dplyr::cumall(x) # - Cumulative all()

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

dplyr::cumany(x)  #- Cumulative any()

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

cummax(x) #- Cumulative max()

##  [1]  1  2  3  4  5  6  7  8  9 10

dplyr::cummean(x) #- Cumulative mean()

##  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

cummin(x) #- Cumulative min()

##  [1] 1 1 1 1 1 1 1 1 1 1

cumprod(x) #- Cumulative prod()

##  [1]       1       2       6      24     120     720    5040   40320
##  [9]  362880 3628800

cumsum(x) #- Cumulative sum()

##  [1]  1  3  6 10 15 21 28 36 45 55

RANKINGS

x=c(1:3,3,2,4)
x1=1:5
dplyr::cume_dist(x1)   #- Proportion of all values <=

## [1] 0.2 0.4 0.6 0.8 1.0

dplyr::dense_rank(x)   #- rank with ties = min, no gaps

## [1] 1 2 3 3 2 4

dplyr::min_rank(x)     #- rank with ties = min

## [1] 1 2 4 4 2 6

dplyr::ntile(x,n=5)   #- bins into n bins

## [1] 1 1 3 4 2 5

dplyr::percent_rank(x1)   #- min_rank scaled to [0,1]

## [1] 0.00 0.25 0.50 0.75 1.00

dplyr::row_number(x)     #- rank with ties = "first"

## [1] 1 2 4 5 3 6

MISC

x <- rnorm(10)
x[between(x, -1, 1)]

## [1] -0.6264538  0.1836433 -0.8356286  0.3295078 -0.8204684  0.4874291
## [7]  0.7383247  0.5757814 -0.3053884

between(x, -1, 1)

##  [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

#dplyr::between()   #- x >= left & x <= right
# dplyr::case_when() - multi-case if_else()
# dplyr::coalesce() - first non-NA values by element across a set of vectors
# dplyr::if_else() - element-wise if() + else()
# dplyr::na_if() - replace specific values with NA
# pmax() - element-wise max()
# pmin() - element-wise min()
# dplyr::recode() - Vectorized switch()
# dplyr::recode_factor() #- Vectorized switch() for factors

Summary Functions

TO USE WITH SUMMARISE ()

#COUNTS


#carriers <- group_by(flights, carrier)
#summarise(carriers, n())

iris%>%group_by(Species) %>%dplyr::summarise(n()) #- number of values/rows

## # A tibble: 3 x 2
##   Species    `n()`
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

dplyr::n_distinct(iris$Species)  # - # of uniques

## [1] 3

sum(!is.na(iris)) #- # of non-NA’s

## [1] 750

length(unique(iris$Species))

## [1] 3

iris%>%group_by(Species) %>%dplyr::count() #- number of values/rows

## # A tibble: 3 x 2
## # Groups:   Species [3]
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

Row Names

rownames_to_column()

Move row names into col.

a <- rownames_to_column(iris, var
= "row_number")

a%>%head()

##   row_number Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          1          5.1         3.5          1.4         0.2  setosa
## 2          2          4.9         3.0          1.4         0.2  setosa
## 3          3          4.7         3.2          1.3         0.2  setosa
## 4          4          4.6         3.1          1.5         0.2  setosa
## 5          5          5.0         3.6          1.4         0.2  setosa
## 6          6          5.4         3.9          1.7         0.4  setosa

column_to_rownames()

Move col in row names.

column_to_rownames(a, var = "row_number")%>%head()

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

remove_rownames(a)

##     row_number Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1            1          5.1         3.5          1.4         0.2
## 2            2          4.9         3.0          1.4         0.2
## 3            3          4.7         3.2          1.3         0.2
## 4            4          4.6         3.1          1.5         0.2
## 5            5          5.0         3.6          1.4         0.2
## 6            6          5.4         3.9          1.7         0.4
## 7            7          4.6         3.4          1.4         0.3
## 8            8          5.0         3.4          1.5         0.2
## 9            9          4.4         2.9          1.4         0.2
## 10          10          4.9         3.1          1.5         0.1
## 11          11          5.4         3.7          1.5         0.2
## 12          12          4.8         3.4          1.6         0.2
## 13          13          4.8         3.0          1.4         0.1
## 14          14          4.3         3.0          1.1         0.1
## 15          15          5.8         4.0          1.2         0.2
## 16          16          5.7         4.4          1.5         0.4
## 17          17          5.4         3.9          1.3         0.4
## 18          18          5.1         3.5          1.4         0.3
## 19          19          5.7         3.8          1.7         0.3
## 20          20          5.1         3.8          1.5         0.3
## 21          21          5.4         3.4          1.7         0.2
## 22          22          5.1         3.7          1.5         0.4
## 23          23          4.6         3.6          1.0         0.2
## 24          24          5.1         3.3          1.7         0.5
## 25          25          4.8         3.4          1.9         0.2
## 26          26          5.0         3.0          1.6         0.2
## 27          27          5.0         3.4          1.6         0.4
## 28          28          5.2         3.5          1.5         0.2
## 29          29          5.2         3.4          1.4         0.2
## 30          30          4.7         3.2          1.6         0.2
## 31          31          4.8         3.1          1.6         0.2
## 32          32          5.4         3.4          1.5         0.4
## 33          33          5.2         4.1          1.5         0.1
## 34          34          5.5         4.2          1.4         0.2
## 35          35          4.9         3.1          1.5         0.2
## 36          36          5.0         3.2          1.2         0.2
## 37          37          5.5         3.5          1.3         0.2
## 38          38          4.9         3.6          1.4         0.1
## 39          39          4.4         3.0          1.3         0.2
## 40          40          5.1         3.4          1.5         0.2
## 41          41          5.0         3.5          1.3         0.3
## 42          42          4.5         2.3          1.3         0.3
## 43          43          4.4         3.2          1.3         0.2
## 44          44          5.0         3.5          1.6         0.6
## 45          45          5.1         3.8          1.9         0.4
## 46          46          4.8         3.0          1.4         0.3
## 47          47          5.1         3.8          1.6         0.2
## 48          48          4.6         3.2          1.4         0.2
## 49          49          5.3         3.7          1.5         0.2
## 50          50          5.0         3.3          1.4         0.2
## 51          51          7.0         3.2          4.7         1.4
## 52          52          6.4         3.2          4.5         1.5
## 53          53          6.9         3.1          4.9         1.5
## 54          54          5.5         2.3          4.0         1.3
## 55          55          6.5         2.8          4.6         1.5
## 56          56          5.7         2.8          4.5         1.3
## 57          57          6.3         3.3          4.7         1.6
## 58          58          4.9         2.4          3.3         1.0
## 59          59          6.6         2.9          4.6         1.3
## 60          60          5.2         2.7          3.9         1.4
## 61          61          5.0         2.0          3.5         1.0
## 62          62          5.9         3.0          4.2         1.5
## 63          63          6.0         2.2          4.0         1.0
## 64          64          6.1         2.9          4.7         1.4
## 65          65          5.6         2.9          3.6         1.3
## 66          66          6.7         3.1          4.4         1.4
## 67          67          5.6         3.0          4.5         1.5
## 68          68          5.8         2.7          4.1         1.0
## 69          69          6.2         2.2          4.5         1.5
## 70          70          5.6         2.5          3.9         1.1
## 71          71          5.9         3.2          4.8         1.8
## 72          72          6.1         2.8          4.0         1.3
## 73          73          6.3         2.5          4.9         1.5
## 74          74          6.1         2.8          4.7         1.2
## 75          75          6.4         2.9          4.3         1.3
## 76          76          6.6         3.0          4.4         1.4
## 77          77          6.8         2.8          4.8         1.4
## 78          78          6.7         3.0          5.0         1.7
## 79          79          6.0         2.9          4.5         1.5
## 80          80          5.7         2.6          3.5         1.0
## 81          81          5.5         2.4          3.8         1.1
## 82          82          5.5         2.4          3.7         1.0
## 83          83          5.8         2.7          3.9         1.2
## 84          84          6.0         2.7          5.1         1.6
## 85          85          5.4         3.0          4.5         1.5
## 86          86          6.0         3.4          4.5         1.6
## 87          87          6.7         3.1          4.7         1.5
## 88          88          6.3         2.3          4.4         1.3
## 89          89          5.6         3.0          4.1         1.3
## 90          90          5.5         2.5          4.0         1.3
## 91          91          5.5         2.6          4.4         1.2
## 92          92          6.1         3.0          4.6         1.4
## 93          93          5.8         2.6          4.0         1.2
## 94          94          5.0         2.3          3.3         1.0
## 95          95          5.6         2.7          4.2         1.3
## 96          96          5.7         3.0          4.2         1.2
## 97          97          5.7         2.9          4.2         1.3
## 98          98          6.2         2.9          4.3         1.3
## 99          99          5.1         2.5          3.0         1.1
## 100        100          5.7         2.8          4.1         1.3
## 101        101          6.3         3.3          6.0         2.5
## 102        102          5.8         2.7          5.1         1.9
## 103        103          7.1         3.0          5.9         2.1
## 104        104          6.3         2.9          5.6         1.8
## 105        105          6.5         3.0          5.8         2.2
## 106        106          7.6         3.0          6.6         2.1
## 107        107          4.9         2.5          4.5         1.7
## 108        108          7.3         2.9          6.3         1.8
## 109        109          6.7         2.5          5.8         1.8
## 110        110          7.2         3.6          6.1         2.5
## 111        111          6.5         3.2          5.1         2.0
## 112        112          6.4         2.7          5.3         1.9
## 113        113          6.8         3.0          5.5         2.1
## 114        114          5.7         2.5          5.0         2.0
## 115        115          5.8         2.8          5.1         2.4
## 116        116          6.4         3.2          5.3         2.3
## 117        117          6.5         3.0          5.5         1.8
## 118        118          7.7         3.8          6.7         2.2
## 119        119          7.7         2.6          6.9         2.3
## 120        120          6.0         2.2          5.0         1.5
## 121        121          6.9         3.2          5.7         2.3
## 122        122          5.6         2.8          4.9         2.0
## 123        123          7.7         2.8          6.7         2.0
## 124        124          6.3         2.7          4.9         1.8
## 125        125          6.7         3.3          5.7         2.1
## 126        126          7.2         3.2          6.0         1.8
## 127        127          6.2         2.8          4.8         1.8
## 128        128          6.1         3.0          4.9         1.8
## 129        129          6.4         2.8          5.6         2.1
## 130        130          7.2         3.0          5.8         1.6
## 131        131          7.4         2.8          6.1         1.9
## 132        132          7.9         3.8          6.4         2.0
## 133        133          6.4         2.8          5.6         2.2
## 134        134          6.3         2.8          5.1         1.5
## 135        135          6.1         2.6          5.6         1.4
## 136        136          7.7         3.0          6.1         2.3
## 137        137          6.3         3.4          5.6         2.4
## 138        138          6.4         3.1          5.5         1.8
## 139        139          6.0         3.0          4.8         1.8
## 140        140          6.9         3.1          5.4         2.1
## 141        141          6.7         3.1          5.6         2.4
## 142        142          6.9         3.1          5.1         2.3
## 143        143          5.8         2.7          5.1         1.9
## 144        144          6.8         3.2          5.9         2.3
## 145        145          6.7         3.3          5.7         2.5
## 146        146          6.7         3.0          5.2         2.3
## 147        147          6.3         2.5          5.0         1.9
## 148        148          6.5         3.0          5.2         2.0
## 149        149          6.2         3.4          5.4         2.3
## 150        150          5.9         3.0          5.1         1.8
##        Species
## 1       setosa
## 2       setosa
## 3       setosa
## 4       setosa
## 5       setosa
## 6       setosa
## 7       setosa
## 8       setosa
## 9       setosa
## 10      setosa
## 11      setosa
## 12      setosa
## 13      setosa
## 14      setosa
## 15      setosa
## 16      setosa
## 17      setosa
## 18      setosa
## 19      setosa
## 20      setosa
## 21      setosa
## 22      setosa
## 23      setosa
## 24      setosa
## 25      setosa
## 26      setosa
## 27      setosa
## 28      setosa
## 29      setosa
## 30      setosa
## 31      setosa
## 32      setosa
## 33      setosa
## 34      setosa
## 35      setosa
## 36      setosa
## 37      setosa
## 38      setosa
## 39      setosa
## 40      setosa
## 41      setosa
## 42      setosa
## 43      setosa
## 44      setosa
## 45      setosa
## 46      setosa
## 47      setosa
## 48      setosa
## 49      setosa
## 50      setosa
## 51  versicolor
## 52  versicolor
## 53  versicolor
## 54  versicolor
## 55  versicolor
## 56  versicolor
## 57  versicolor
## 58  versicolor
## 59  versicolor
## 60  versicolor
## 61  versicolor
## 62  versicolor
## 63  versicolor
## 64  versicolor
## 65  versicolor
## 66  versicolor
## 67  versicolor
## 68  versicolor
## 69  versicolor
## 70  versicolor
## 71  versicolor
## 72  versicolor
## 73  versicolor
## 74  versicolor
## 75  versicolor
## 76  versicolor
## 77  versicolor
## 78  versicolor
## 79  versicolor
## 80  versicolor
## 81  versicolor
## 82  versicolor
## 83  versicolor
## 84  versicolor
## 85  versicolor
## 86  versicolor
## 87  versicolor
## 88  versicolor
## 89  versicolor
## 90  versicolor
## 91  versicolor
## 92  versicolor
## 93  versicolor
## 94  versicolor
## 95  versicolor
## 96  versicolor
## 97  versicolor
## 98  versicolor
## 99  versicolor
## 100 versicolor
## 101  virginica
## 102  virginica
## 103  virginica
## 104  virginica
## 105  virginica
## 106  virginica
## 107  virginica
## 108  virginica
## 109  virginica
## 110  virginica
## 111  virginica
## 112  virginica
## 113  virginica
## 114  virginica
## 115  virginica
## 116  virginica
## 117  virginica
## 118  virginica
## 119  virginica
## 120  virginica
## 121  virginica
## 122  virginica
## 123  virginica
## 124  virginica
## 125  virginica
## 126  virginica
## 127  virginica
## 128  virginica
## 129  virginica
## 130  virginica
## 131  virginica
## 132  virginica
## 133  virginica
## 134  virginica
## 135  virginica
## 136  virginica
## 137  virginica
## 138  virginica
## 139  virginica
## 140  virginica
## 141  virginica
## 142  virginica
## 143  virginica
## 144  virginica
## 145  virginica
## 146  virginica
## 147  virginica
## 148  virginica
## 149  virginica
## 150  virginica

Summary Functions

x=data_frame(A=c("a","b","c"),B=c("t","u","v"),c=c(1:3))

y=data_frame(A=c("a","b","d"),B=c("t","u","w"),D=c(3,2,1))
x

## # A tibble: 3 x 3
##   A     B         c
##   <chr> <chr> <int>
## 1 a     t         1
## 2 b     u         2
## 3 c     v         3

## # A tibble: 3 x 3
##   A     B         D
##   <chr> <chr> <dbl>
## 1 a     t      3.00
## 2 b     u      2.00
## 3 d     w      1.00

bind_cols()

bind_cols(x,y)

## # A tibble: 3 x 6
##   A     B         c A1    B1        D
##   <chr> <chr> <int> <chr> <chr> <dbl>
## 1 a     t         1 a     t      3.00
## 2 b     u         2 b     u      2.00
## 3 c     v         3 d     w      1.00

left_join(x, y, by = NULL,copy=FALSE, suffix=c(“.x”,“.y”),…)

Join matching values from y to x.

left_join(x, y, by = NULL)

## # A tibble: 3 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1  3.00
## 2 b     u         2  2.00
## 3 c     v         3 NA

right_join(x, y, by = NULL, copy =

FALSE, suffix=c(“.x”,“.y”),…) Join matching values from x to y.

right_join(x, y, by = NULL)

## # A tibble: 3 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1  3.00
## 2 b     u         2  2.00
## 3 d     w        NA  1.00

Vinner_join(x, y, by = NULL, copy =FALSE, suffix=c(“.x”,“.y”),…)

Join data. Retain only rows with matches.

inner_join(x, y, by = NULL)

## # A tibble: 2 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1  3.00
## 2 b     u         2  2.00

merge(x,y)

##   A B c D
## 1 a t 1 3
## 2 b u 2 2

full_join(x, y, by = NULL,copy=FALSE, suffix=c(“.x”,“.y”),…)

Join data. Retain all values, all rows.

full_join(x, y)

## # A tibble: 4 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1  3.00
## 2 b     u         2  2.00
## 3 c     v         3 NA   
## 4 d     w        NA  1.00

#union(x,y)


union_all(x,y)

## # A tibble: 6 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1 NA   
## 2 b     u         2 NA   
## 3 c     v         3 NA   
## 4 a     t        NA  3.00
## 5 b     u        NA  2.00
## 6 d     w        NA  1.00

CUse by = c(“col1”, “col2”) to

specify the column(s) to match on. left_join(x, y, by = “A”)

left_join(x, y, by = "A")

## # A tibble: 3 x 5
##   A     B.x       c B.y       D
##   <chr> <chr> <int> <chr> <dbl>
## 1 a     t         1 t      3.00
## 2 b     u         2 u      2.00
## 3 c     v         3 <NA>  NA

Use a named vector, by = c(“col1” =“col2”),

to match on columns with different names in each data set. left_join(x, y, by = c(“C” = “D”))

left_join(x, y, by = c("c" = "D"))

## # A tibble: 3 x 5
##   A.x   B.x       c A.y   B.y  
##   <chr> <chr> <dbl> <chr> <chr>
## 1 a     t      1.00 d     w    
## 2 b     u      2.00 b     u    
## 3 c     v      3.00 a     t

Use suffix to specify suffix to give to

duplicate column names. left_join(x, y, by = c(“C” = “D”), suffix = c(“1”, “2”))

left_join(x, y, by = c("c" = "D"), suffix =c("1", "2"))

## # A tibble: 3 x 5
##   A1    B1        c A2    B2   
##   <chr> <chr> <dbl> <chr> <chr>
## 1 a     t      1.00 d     w    
## 2 b     u      2.00 b     u    
## 3 c     v      3.00 a     t

merge=inner_join

merge(x,y)

##   A B c D
## 1 a t 1 3
## 2 b u 2 2

merge(x,y,by="A")

##   A B.x c B.y D
## 1 a   t 1   t 3
## 2 b   u 2   u 2

Use bind_rows() to paste tables below each

other as they are.

bind_rows(x,y)

## # A tibble: 6 x 4
##   A     B         c     D
##   <chr> <chr> <int> <dbl>
## 1 a     t         1 NA   
## 2 b     u         2 NA   
## 3 c     v         3 NA   
## 4 a     t        NA  3.00
## 5 b     u        NA  2.00
## 6 d     w        NA  1.00

setdiff(x, y, …)

Rows that appear in x but not z.

setdiff(x=c(1,2,3,4,5), y=c(3,4,5,6,7))

## [1] 1 2

union(x, y, …)

Rows that appear in x or z.(Duplicates removed). union_all() retains duplicates.

union(x=c(1,2,3,4,5), y=c(3,4,5,6,7))

## [1] 1 2 3 4 5 6 7

Use setequal()

to test whether two data sets contain the exact same rows (in any order)

setequal(x,y)

## FALSE: Cols in y but not x: `D`. Cols in x but not y: `c`.

Use a “Filtering Join”

to filter one table against the rows of another.

semi_join(x, y, by = NULL, …)

Return rows of x that have a match in y.

semi_join(x, y, by = NULL)

## # A tibble: 2 x 3
##   A     B         c
##   <chr> <chr> <int>
## 1 a     t         1
## 2 b     u         2

anti_join(x, y, by = NULL, …)

Return rows of x that do not have a match in y.

anti_join(x, y, by = NULL)

## # A tibble: 1 x 3
##   A     B         c
##   <chr> <chr> <int>
## 1 c     v         3

intersect(x, y, …)

Rows that appear in both x and z.

x1=data_frame(A=c("a","b","c"),B=c("t","u","v"),C=c(1:3))

y1=data_frame(A=c("a","b","c"),B=c("t","u","w"),C=c(3,2,1))

intersect(x1, y1)

## # A tibble: 1 x 3
##   A     B         C
##   <chr> <chr> <dbl>
## 1 b     u      2.00

inner_join(x1,y1,by="A")

## # A tibble: 3 x 5
##   A     B.x     C.x B.y     C.y
##   <chr> <chr> <int> <chr> <dbl>
## 1 a     t         1 t      3.00
## 2 b     u         2 u      2.00
## 3 c     v         3 w      1.00

full_join(x1,y1)

## # A tibble: 5 x 3
##   A     B         C
##   <chr> <chr> <dbl>
## 1 a     t      1.00
## 2 b     u      2.00
## 3 c     v      3.00
## 4 a     t      3.00
## 5 c     w      1.00

programming with dplyr,rlang

!! and Wrting Functions with dplyr

In dplyr (and in tidyeval in general) you use !! to say that you want to unquote an input so that it’s evaluated, not quoted. This gives us a function that actually does what we want.

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5), 
  b = sample(5)
)

my_summarise <- function(df, group_var) {
  df %>%
    group_by(!!group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, quo(g1))

## # A tibble: 2 x 2
##      g1     a
##   <dbl> <dbl>
## 1  1.00  3.00
## 2  2.00  3.00

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5), 
  b = sample(5)
)

df %>%
  group_by(g1) %>%
  summarise(a = mean(a))

## # A tibble: 2 x 2
##      g1     a
##   <dbl> <dbl>
## 1  1.00  4.00
## 2  2.00  2.33

###doesnt work 
my_summarise <- function(df, group_var) {
  df %>%
    group_by(group_var) %>%
    summarise(a = mean(a))
}

#my_summarise(df, g1)

quo() and enquo() works like “: it quotes its input rather than evaluating it.

quo(g1)

## <quosure: global>
## ~g1

quo(a + b + c)

## <quosure: global>
## ~a + b + c

quo("a")

## <quosure: empty>
## ~"a"

#  Error in (function (x, strict = TRUE) : must be called in a function
#enquo(g1)


noquote(paste("~","g1"))  #equivalent

## [1] ~ g1

quote(a + b + c)

## a + b + c

quote("a + b + c")

## [1] "a + b + c"

f1 <- function(x, y = x)             { x <- x + 1; y }
s1 <- function(x, y = substitute(x)) { x <- x + 1; y }
s2 <- function(x, y) { if(missing(y)) y <- substitute(x); x <- x + 1; y }
a <- 10
f1(a)  # 11

## [1] 11

s1(a)  # 11

## [1] 11

s2(a)  # a

## a

typeof(s2(a))  # "symbol"

## [1] "symbol"

my_summarise <- function(df, group_var) {
  df %>%
    group_by(!!group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, quo(g1))

## # A tibble: 2 x 2
##      g1     a
##   <dbl> <dbl>
## 1  1.00  4.00
## 2  2.00  2.33

my_summarise <- function(df, group_by) {
  group_by <- enquo(group_by)
  print(group_by)

  df %>%
    group_by(!!group_by) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)

## <quosure: global>
## ~g1

## # A tibble: 2 x 2
##      g1     a
##   <dbl> <dbl>
## 1  1.00  4.00
## 2  2.00  2.33

(If you’re familiar with quote() and substitute() in base R, quo() is equivalent to quote() and enquo() is equivalent to substitute().)

Different input variable

summarise(df, mean = mean(a), sum = sum(a), n = n())

## # A tibble: 1 x 3
##    mean   sum     n
##   <dbl> <int> <int>
## 1  3.00    15     5

summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())

## # A tibble: 1 x 3
##    mean   sum     n
##   <dbl> <int> <int>
## 1  10.6    53     5

To turn this into a function, we start by testing the basic approach interactively: we quote the variable with quo(), then unquoting it in the dplyr call with !!. Notice that we can unquote anywhere inside a complicated expression.

my_var <- quo(a)
summarise(df, mean = mean(!!my_var), sum = sum(!!my_var), n = n())

## # A tibble: 1 x 3
##    mean   sum     n
##   <dbl> <int> <int>
## 1  3.00    15     5

quo(summarise(df, 
  mean = mean(!!my_var),
  sum = sum(!!my_var),
  n = n()
))

## <quosure: global>
## ~summarise(df, mean = mean(~a), sum = sum(~a), n = n())

quote(summarise(df, 
  mean = mean(!!my_var),
  sum = sum(!!my_var),
  n = n()
))

## summarise(df, mean = mean(!(!my_var)), sum = sum(!(!my_var)), 
##     n = n())

noquote(paste("~",quo(summarise(df, 
  mean = mean(!!my_var),
  sum = sum(!!my_var),
  n = n()
))))

## [1] ~ ~                                                     
## [2] ~ summarise(df, mean = mean(~a), sum = sum(~a), n = n())

my_summarise2 <- function(df, expr) {
  expr <- enquo(expr)
  
  summarise(df, 
    mean = mean(!!expr),
    sum = sum(!!expr),
    n = n()
  )
}
my_summarise2(df, a)

## # A tibble: 1 x 3
##    mean   sum     n
##   <dbl> <int> <int>
## 1  3.00    15     5

Different input and output variable

mutate(df, mean_a = mean(a), sum_a = sum(a))

## # A tibble: 5 x 6
##      g1    g2     a     b mean_a sum_a
##   <dbl> <dbl> <int> <int>  <dbl> <int>
## 1  1.00  1.00     3     4   3.00    15
## 2  1.00  2.00     5     5   3.00    15
## 3  2.00  1.00     2     1   3.00    15
## 4  2.00  2.00     1     2   3.00    15
## 5  2.00  1.00     4     3   3.00    15

mutate(df, mean_b = mean(b), sum_b = sum(b))

## # A tibble: 5 x 6
##      g1    g2     a     b mean_b sum_b
##   <dbl> <dbl> <int> <int>  <dbl> <int>
## 1  1.00  1.00     3     4   3.00    15
## 2  1.00  2.00     5     5   3.00    15
## 3  2.00  1.00     2     1   3.00    15
## 4  2.00  2.00     1     2   3.00    15
## 5  2.00  1.00     4     3   3.00    15

my_mutate <- function(df, expr) {
  expr <- enquo(expr)
  mean_name <- paste0("mean_", quo_name(expr))
  sum_name <- paste0("sum_", quo_name(expr))
  
  mutate(df, 
    !!mean_name := mean(!!expr), 
    !!sum_name := sum(!!expr)
  )
}


my_mutate(df, a)

## # A tibble: 5 x 6
##      g1    g2     a     b mean_a sum_a
##   <dbl> <dbl> <int> <int>  <dbl> <int>
## 1  1.00  1.00     3     4   3.00    15
## 2  1.00  2.00     5     5   3.00    15
## 3  2.00  1.00     2     1   3.00    15
## 4  2.00  2.00     1     2   3.00    15
## 5  2.00  1.00     4     3   3.00    15

Capturing multiple variables

Extend my_summarise() to accept any number of grouping variables.Make three changes:

Use … in the function definition so our function can accept any number of arguments.
Use quos() to capture all the … as a list of formulas.
Use !!! instead of !! to splice the arguments into group_by().

my_summarise <- function(df, ...) {
  group_by <- quos(...)

  df %>%
    group_by(!!!group_by) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1, g2)

## # A tibble: 4 x 3
## # Groups:   g1 [?]
##      g1    g2     a
##   <dbl> <dbl> <dbl>
## 1  1.00  1.00  3.00
## 2  1.00  2.00  5.00
## 3  2.00  1.00  3.00
## 4  2.00  2.00  1.00

!!! takes a list of elements and splices them into to the current call

args <- list(na.rm = TRUE, trim = 0.25)

quo(mean(x, !!! args))

## <quosure: global>
## ~mean(x, na.rm = TRUE, trim = 0.25)

args <- list(quo(x), na.rm = TRUE, trim = 0.25)

quo(mean(!!! args))

## <quosure: global>
## ~mean(~x, na.rm = TRUE, trim = 0.25)

Quoting

f <- function(x) {
  quo(x)
}

x1 <- f(10)
x2 <- f(100)

x1

## <quosure: local>
## ~x

x2

## <quosure: local>
## ~x

The environments using rlang::get_env() — they’re different.

library(rlang)

get_env(x1)

## <environment: 0x7f9924a3fd88>

get_env(x2)

## <environment: 0x7f9924135c60>

evaluating those formulas using rlang::eval_tidy(), we see that they yield different values

eval_tidy(x1)

## [1] 10

eval_tidy(x2)

## [1] 100

user_var <- 1000
mtcars %>% summarise(cyl = mean(cyl) * user_var)

##      cyl
## 1 6187.5

quosures

var <- ~toupper(letters[1:5])
var

## ~toupper(letters[1:5])

# You can extract its expression:
get_expr(var)

## toupper(letters[1:5])

# Or inspect its enclosure:
get_env(var)

## <environment: R_GlobalEnv>

Unquoting

# Here we capture `letters[1:5]` as an expression:
quo(toupper(letters[1:5]))

## <quosure: global>
## ~toupper(letters[1:5])

# Here we capture the value of `letters[1:5]`
quo(toupper(!!letters[1:5]))

## <quosure: global>
## ~toupper(c("a", "b", "c", "d", "e"))

quo(toupper(UQ(letters[1:5])))

## <quosure: global>
## ~toupper(c("a", "b", "c", "d", "e"))

quo(!!toupper(letters[1:5]))

## <quosure: empty>
## ~c("A", "B", "C", "D", "E")

my_mutate <- function(x) {
  mtcars %>%
    select(cyl) %>%
    slice(1:4) %>%
    mutate(cyl2 = cyl + (!! x))
}

f <- function(x) quo(x)
expr1 <- f(100)
f_1<- function(x) quo(!!x)
expr2 <- f_1(10)

expr1

## <quosure: local>
## ~x

expr2

## <quosure: empty>
## ~10

my_mutate(expr1)

## # A tibble: 4 x 2
##     cyl  cyl2
##   <dbl> <dbl>
## 1  6.00   106
## 2  6.00   106
## 3  4.00   104
## 4  6.00   106

my_mutate(expr2)

## # A tibble: 4 x 2
##     cyl  cyl2
##   <dbl> <dbl>
## 1  6.00  16.0
## 2  6.00  16.0
## 3  4.00  14.0
## 4  6.00  16.0

my_fun <- quo(fun)
#quo(!!my_fun(x, y, z))
#Error in my_fun(x, y, z) : could not find function "my_fun"


quo(UQ(my_fun)(x, y, z))

## <quosure: global>
## ~(~fun)(x, y, z)

my_var <- quo(x)
quo(filter(df, !!my_var == 1))

## <quosure: global>
## ~filter(df, FALSE)

quo(filter(df, UQ(my_var) == 1))

## <quosure: global>
## ~filter(df, (~x) == 1)

UQ() yields a quosure containing a formula. That ensures that when the quosure is evaluated, it’ll be looked up in the right environment. In certain code-generation scenarios you just want to use expression and ignore the environment. That’s the job of UQE().

#rlang::quasiquotation
my_var <- quo(x)

quo(UQE(my_fun)(x, y, z))

## <quosure: global>
## ~fun(x, y, z)

quo(filter(df, UQE(my_var) == 1))

## <quosure: global>
## ~filter(df, x == 1)

UQE(my_var)

## x

#### Unquote-splicing

The second unquote operation is unquote-splicing. Its functional form is UQS() and the syntactic shortcut is !!!. It takes a vector and inserts each element of the vector in the surrounding function call:

quo(list(!!! letters[1:5]))

## <quosure: global>
## ~list("a", "b", "c", "d", "e")

quo(list(!! letters[1:5]))

## <quosure: global>
## ~list(c("a", "b", "c", "d", "e"))

quo(list( UQS(letters[1:5])))

## <quosure: global>
## ~list("a", "b", "c", "d", "e")

x <- list(foo = 1L, bar = quo(baz))
quo(list(!!! x))

## <quosure: global>
## ~list(foo = 1L, bar = ~baz)

args <- list(mean = quo(mean(cyl)), count =quo( n()))

args

## $mean
## <quosure: global>
## ~mean(cyl)
## 
## $count
## <quosure: global>
## ~n()

mtcars %>%
  group_by(am) %>%
  summarise(!!! args)

## # A tibble: 2 x 3
##      am  mean count
##   <dbl> <dbl> <int>
## 1  0     6.95    19
## 2  1.00  5.08    13

Setting variable names

The final unquote operation is setting argument names. You’ve seen one way to do that above, but you can also use the definition operator := instead of =. := supports unquoting on both the LHS and the RHS.

The rules on the LHS are slightly different: the unquoted operand should evaluate to a string or a symbol.

mean_nm <- "mean"
count_nm <- "count"

mtcars %>%
  group_by(am) %>%
  summarise(
    !!mean_nm := mean(cyl),
    !!count_nm := n()
  )

## # A tibble: 2 x 3
##      am  mean count
##   <dbl> <dbl> <int>
## 1  0     6.95    19
## 2  1.00  5.08    13

#tidy_names()

#tidyverse_update()


#tidyverse_conflicts()


#tidyverse_deps()

#install.packages("tidyverse")

tidyverse_logo()

## ⬢ __  _    __   .    ⬡           ⬢  . 
##  / /_(_)__/ /_ ___  _____ _______ ___ 
## / __/ / _  / // / |/ / -_) __(_-</ -_)
## \__/_/\_,_/\_, /|___/\__/_/ /___/\__/ 
##      ⬢  . /___/      ⬡      .       ⬢

Dplyr Tutorial 2

Nana Boateng

February 02, 2018

Loading Packages

Manipulate Variables

select(.data, …)

starts_with(" “)

select helpers

mutate(.data, …)

transmute(.data, …)

mutate_all(.tbl, .funs, …) Apply funs to every

mutate_if(.tbl, .predicate, .funs, …)

mutate_at(.tbl, .cols, .funs, …) Apply funs to

add_column(.data, …, .before = NULL, .after =NULL)

Manipulate Cases

filter(.data, …) Extract rows that meet logical criteria. Also filter_().

distinct(.data, …, .keep_all = FALSE) Remove rows with duplicate values. Also distinct_().

sample_frac(tbl, size = 1, replace = FALSE,weight = NULL, .env = parent.frame())

sample_n(tbl, size, replace = FALSE, weight =NULL, .env = parent.frame())

slice(.data, …) Select rows by position. Also slice_().

top_n(x, n, wt) Select and order top n entries (bygroup if grouped data).

arrange(.data, …) Order rows by values of a column (low to high), use with desc() to order from high to low.

add_row(.data, …, .before = NULL, .after = NULL)

Summarise Cases

summarise(.data, …)

count(x, …, wt = NULL, sort = FALSE)

Group Cases

ungroup(x, …)

Vectorized Functions

RANKINGS

MISC

Summary Functions

TO USE WITH SUMMARISE ()

Row Names

rownames_to_column()

column_to_rownames()

Summary Functions

bind_cols()

left_join(x, y, by = NULL,copy=FALSE, suffix=c(“.x”,“.y”),…)

right_join(x, y, by = NULL, copy =

Vinner_join(x, y, by = NULL, copy =FALSE, suffix=c(“.x”,“.y”),…)

full_join(x, y, by = NULL,copy=FALSE, suffix=c(“.x”,“.y”),…)

CUse by = c(“col1”, “col2”) to

Use a named vector, by = c(“col1” =“col2”),

Use suffix to specify suffix to give to

merge=inner_join

Use bind_rows() to paste tables below each

setdiff(x, y, …)

union(x, y, …)

Use setequal()

Use a “Filtering Join”

semi_join(x, y, by = NULL, …)

anti_join(x, y, by = NULL, …)

intersect(x, y, …)

programming with dplyr,rlang

!! and Wrting Functions with dplyr

quo() and enquo() works like “: it quotes its input rather than evaluating it.

Different input variable

Different input and output variable

Capturing multiple variables

Quoting

Unquoting

Setting variable names