%>% #rstatsThe pipe operator (%>%) of the magrittr has been one of my favorite R tools since I first saw it being used in the ggplot2 graphics. My main reason for using %>% is that it makes my R codes look more readable.
As with other tools, I find time to practice with %>% to make myself more familiar with it. In one such session, I was playing with the %>% and planned to do a test of correlation at the end of a long chunk of code. I was annoyed when, at first, I couldn’t run the cor function using %>%. Let me demonstrate my point with the use of the mtcars package. The following examples will be a simple and seem to be complicated by the use of %>% but in my actual use case, %>% was able to shorten the length code by at least half as much.
library(dplyr)
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
The graphs shows that as the weight of a car increases, the milleage decreases–a negative linear relationship. To find the Pearson correlation coefficient, which measures the strength of the linear relationship between the two variables wt and mpg, we can use the cor function. (Type ?cor for more info.) There are several ways to do this.
# Get the correlation of mtcar's wt variable and mtcar's mpg variable
cor(mtcars$wt, mtcars$mpg)
## [1] -0.867659376517
This is one of the most usual ways in which I see this function being used. What I don’t like about this style is the repeated call to mtcars. Another way to do this is using the cor in conjunction with the with function. (Type ?with.)
# Get the correlation of the wt and mpg variables within the mtcar data set
with(mtcars, cor(wt, mpg))
## [1] -0.867659376517
So, perhaps using %>% we can write
mtcars %>% cor(wt, mpg)
Surprisingly, that didn’t work! Instead, the output throws the message
Error in cor(., wt, y) : invalid 'use' argument
That’s like R’s saying “Hey, stupid, you are doing it all wrong!” But seriously, what is the error message saying? It appears that %>% placed the object mtcars as a first argument, the . in cor(., wt, y), and that y is an invalid use.
So what is happening? Typing ?cor I found out the following:
cor(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))
Arguments
x a numeric vector, matrix or data frame.
y NULL (default) or a vector, matrix or data frame with compatible dimensions to x.
The default is equivalent to y = x (but more efficient).
na.rm logical. Should missing values be removed?
use an optional character string giving a method for computing covariances in the presence of
missing values. This must be (an abbreviation of) one of the strings "everything",
"all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
method a character string indicating which correlation coefficient (or covariance) is to be
computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.
V symmetric numeric matrix, usually positive definite such as a covariance matrix.
So, essentially, mtcars %>% cor(wt, mpg) is equivalent to:
cor(x = mtcars, y = wt, use = mpg)
So what to do in order to get a correlation or correlation matrix? Well, the documentation for cor says that x can be a matrix or data frame. So, we can instead say
mtcars %>% select(wt, mpg) %>% cor
## wt mpg
## wt 1.000000000000 -0.867659376517
## mpg -0.867659376517 1.000000000000
which produces a correlation matrix.
So what are the lessons here for me?
?`%>%`
?cor
Which of the following will run without errors?
cor.test(~wt + mpg, mtcars)
cor.test(x = wt, y = mpg, data = mtcars)
with(mtcars, cor.test(wt, mpg))
mtcars %>% cor.test(~wt + mpg)
mtcars %>% cor.test(., ~wt + mpg)
mtcars %>% cor.test(~wt + mpg, .)
mtcars %>% cor.test(~wt + mpg, data = .)