%grep% operator

For Sonja Dunemann

Using R’s built-in grep function is really inconvenient for interactive work. There already exists a convenient %in% operator for testing membership in a sequence. However, real data analysis rarely presents with well-defined sequences. Strings are much more common.

Implementation

`%grep%` <- function(pattern, x) grepl(pattern, x)

That’s all.

Usage

The %grep% operator functions the same way that %in% does, but the lookup is within a string.

Direct string matching:

# Select 'setosa' entries
iris['setosa' %grep% iris$Species, ] %>% head
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Regular expressions:

# Select all entries where species ends with 'a'
X <- iris['a$' %grep% iris$Species, ]
unique(X$Species)
## [1] setosa    virginica
## Levels: setosa versicolor virginica

More complicated example

This allows us to produce more complicated chained selections. I always forget how to write a regular expression for “one but not two”. In case of the %grep% operator, we can use R’s logical functions.

Let’s create some sample data:

# Create some silly example
X <- data.frame(
  value = rnorm(100),
  fb = paste(
    sample(c('foo', 'bar'), 100, replace=T),
    sample(c('foo', 'bar'), 100, replace=T)))
X %>% head
##        value      fb
## 1  2.0187125 bar foo
## 2  0.6320960 foo foo
## 3 -0.2587342 bar bar
## 4 -0.7468302 foo bar
## 5 -0.1864790 bar bar
## 6 -0.9768208 bar foo

And apply %grep%:

# Select foo without bar (same as `foo foo`)
X['foo' %grep% X$fb & !'bar' %grep% X$fb, ] %>% head
##          value      fb
## 2   0.63209603 foo foo
## 8   0.28743489 foo foo
## 21 -1.56684516 foo foo
## 30 -0.26888993 foo foo
## 31 -1.76683449 foo foo
## 32  0.03118082 foo foo