%grep%
operatorFor Sonja Dunemann
Using R
’s built-in grep
function is really inconvenient for interactive work. There already exists a convenient %in%
operator for testing membership in a sequence. However, real data analysis rarely presents with well-defined sequences. Strings are much more common.
`%grep%` <- function(pattern, x) grepl(pattern, x)
That’s all.
The %grep%
operator functions the same way that %in%
does, but the lookup is within a string.
Direct string matching:
# Select 'setosa' entries
iris['setosa' %grep% iris$Species, ] %>% head
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Regular expressions:
# Select all entries where species ends with 'a'
X <- iris['a$' %grep% iris$Species, ]
unique(X$Species)
## [1] setosa virginica
## Levels: setosa versicolor virginica
This allows us to produce more complicated chained selections. I always forget how to write a regular expression for “one
but not two
”. In case of the %grep%
operator, we can use R
’s logical functions.
Let’s create some sample data:
# Create some silly example
X <- data.frame(
value = rnorm(100),
fb = paste(
sample(c('foo', 'bar'), 100, replace=T),
sample(c('foo', 'bar'), 100, replace=T)))
X %>% head
## value fb
## 1 2.0187125 bar foo
## 2 0.6320960 foo foo
## 3 -0.2587342 bar bar
## 4 -0.7468302 foo bar
## 5 -0.1864790 bar bar
## 6 -0.9768208 bar foo
And apply %grep%
:
# Select foo without bar (same as `foo foo`)
X['foo' %grep% X$fb & !'bar' %grep% X$fb, ] %>% head
## value fb
## 2 0.63209603 foo foo
## 8 0.28743489 foo foo
## 21 -1.56684516 foo foo
## 30 -0.26888993 foo foo
## 31 -1.76683449 foo foo
## 32 0.03118082 foo foo