Let’s say we have one value of a “Sex” variable that’s “Male”, and another that’s " Male “.
They’re the same value but when we do a string comparison they’re listed as separate values.
So we need to remove the leading and trailing spaces.
Let’s create an example string to demonstrate this:
# Create a simple string with spaces either side
s <- " Male "
Sometimes its not obvious just by looking at the data that the problem exists.
Three ways you can analyse each value to spot this problem are by:
# Print it to see if it has leading/trailing spaces. Note that it does show there are leading and trailing spaces.
print(s)
## [1] " Male "
# Look at the structure of the string to see if that shows any leading/trailing spaces. Note that it does show there are leading and trailing spaces.
str(s)
## chr " Male "
# Compare it with "Male" to see if it matches. Note that it returns FALSE - they don't match.
identical(s, "Male")
## [1] FALSE
Remove the spaces using the stringr package’s str_trim() function, like this:
# Load the stringr package so we can use the str_trim() function
library(stringr)
## Warning: package 'stringr' was built under R version 3.5.2
# Use the StringR package's str_trim() function to remove leading and trailing spaces
s <- str_trim(s)
You can check that it’s worked by using print(), str() or identical():
# Print it to see if it has leading/trailing spaces. It doesn't have any spaces now :)
print(s)
## [1] "Male"
# Look at the structure of the string to see if that shows any leading/trailing spaces. It doesn't have any spaces now :)
str(s)
## chr "Male"
# Compare it with "Male" to see if it matches. Note that it returns TRUE - they now match! :)
identical(s, "Male")
## [1] TRUE