The goal of this tutorial is to learn the basics and how to use the pipe operator.
# The pipe operator lives in the dplyr library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# In this example we will use the open repository of plants classification Iris.
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
x <- c(1,4,7,0,8,6,5,6,2,8,1,9,4)
# A function of one argument works like this: function(argument)
mean(x)
## [1] 4.692308
# However using the pipe operator we can write argument %>% function
x %>% mean
## [1] 4.692308
# When using more than one function in the same object we can transform
# g(f(x)) into x %>% f() %>% g()
# Notice that the order is from inside to outside
# We can for example group by and summarize in the same line
# The object where the functions are applied is defined so we don't need to write the table everytime
# In a similar way we don't need to specify the group in the summarize because it's implicit
iris %>% group_by(Species) %>% summarize_all(funs(mean))
## # A tibble: 3 × 5
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026
In this tutorial we have learnt some basics about the pipe operator. It can be useful if we want to attach different functions to the same object.