Comparing Vectors

Prologue

  • We may have 2 vectors that we wish to compare.
  • We may want to observe the elements that are specific to either one (mutually exclusive) or both (intersect) vectors.
  • There are many downstream applications for finding mutually exclusive or overlapping vectors. Personally, I use them often for subsetting data frames. Refer to Subsetting Data Frames for more information.
  • Let’s first create our vectors for this exercise.
set.seed(88)

vec1 <- c("A", "B", "C", "D", "E")
vec1
## [1] "A" "B" "C" "D" "E"
vec2 <- c("C", "D", "E", "F", "G")
vec2
## [1] "C" "D" "E" "F" "G"
vec3 <- c("E", "D", "C", "A", "B")

vec4 <- sample(toupper(letters[1:5]), size=20, replace=TRUE)
vec4
##  [1] "C" "A" "D" "C" "E" "E" "A" "D" "D" "E" "C" "D" "C" "A" "A" "B" "A"
## [18] "C" "D" "D"
vec5 <- rev(vec1)
vec5
## [1] "E" "D" "C" "B" "A"

Finding the overlapping elements

  • Use the intersect() function to look for elements present in both vectors.
  • The first argument x specifies the first vector.
  • The second argument y specifies the second vector.
  • The order of which the vectors are specified does not matter in this instance.
intersect(x=vec1, y=vec2)
## [1] "C" "D" "E"
intersect(x=vec2, y=vec1)
## [1] "C" "D" "E"

Finding mutually exclusive elements

  • Use the setdiff() function to look for elements present in either one of the vectors.
  • The first argument x specifies the first vector
  • The second argument y specifies the second vector.
  • The order of which the vectors are specified does matter in this instance.
  • The output contains only the elements unique to the first vector specified.
setdiff(x=vec1, y=vec2)
## [1] "A" "B"
setdiff(x=vec2, y=vec1)
## [1] "F" "G"

Finding all unique vectors

  • Use the union() function to look for elements present in either one or both vectors.
  • The first argument x specifies the first vector
  • The second argument y specifies the second vector.
  • The order of which the vectors are specified does matter in this instance.
  • The output contains all the elements present in either one or both vectors, and each unique element is presented once regardless if the element is present in both or just one of the vectors.
  • Precedence of the ordering of the elements, however, is given to the vector specified in the x argument.
union(x=vec1, y=vec2)
## [1] "A" "B" "C" "D" "E" "F" "G"
union(x=vec2, y=vec1)
## [1] "C" "D" "E" "F" "G" "A" "B"

Verifying identical elements

  • You may have a separate vector of elements which you would like to check if the elements are also present in another vector.
  • The setequal() function does exactly just this.
  • The setequal() function does not take into account the ordering of the elements. As long as each element is present at least once in both vectors, the function will return the logical TRUE.
  • The first argument x specifies the first vector
  • The second argument y specifies the second vector.
setequal(x=vec1, y=vec4)
## [1] TRUE
setequal(x=vec4, y=vec1)
## [1] TRUE
  • The identical() function is much more stringent compared to the setequal() function.
  • The former requires that the order of the elements in both vectors to be identifical for the function to return the logical TRUE.
  • The first argument x specifies the first vector
  • The second argument y specifies the second vector.
  • As you would have guessed, the order of which the vectors are specified does not matter in this instance.
identical(x=vec1, y=vec1)
## [1] TRUE
identical(x=vec1, y=vec4)
## [1] FALSE
identical(x=vec1, y=vec5)
## [1] FALSE