The goal of this tutorial is to learn how to use the grep function. A function very useful to find patterns in texts.
# In this tutorial we are going to use the iris dataset
# We will count the amount of plants of each Species
data("iris")
iris$Species <- as.character(iris$Species)
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
# The grep function finds the position where the pattern is found
grep("setosa", iris$Species)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50
# The great power of this function is that we can match partial patterns
grep("set", iris$Species)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50
# However, we could find problems like case sensitive patterns
grep("Setosa", iris$Species)
## integer(0)
# We can skip this problem using the ignore.case parameter
grep("Setosa", iris$Species, ignore.case = TRUE)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50
# We can as well find the positions where the pattern is NOT found using invert
grep("Setosa", iris$Species, ignore.case = TRUE, invert = TRUE)
## [1] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
## [18] 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
## [35] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
## [52] 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
## [69] 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
## [86] 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
In this tutorial we have learnt how to find patterns in data using the grep function.