The Problem

Let’s say we have one value of a “Sex” variable that’s “Male”, and another that’s " Male “.

They’re the same value but when we do a string comparison they’re listed as separate values.

So we need to remove the leading and trailing spaces.

Let’s create an example string to demonstrate this:

# Create a simple string with spaces either side
s <- " Male "

Identifying The Problem

Sometimes its not obvious just by looking at the data that the problem exists.

Three ways you can analyse each value to spot this problem are by:

# Print it to see if it has leading/trailing spaces. Note that it does show there are leading and trailing spaces.
print(s)
## [1] " Male "
# Look at the structure of the string to see if that shows any leading/trailing spaces. Note that it does show there are leading and trailing spaces.
str(s)
##  chr " Male "
# Compare it with "Male" to see if it matches. Note that it returns FALSE - they don't match.
identical(s, "Male")
## [1] FALSE

Remove Leading/Trailing Spaces

Remove the spaces using the stringr package’s str_trim() function, like this:

# Load the stringr package so we can use the str_trim() function
library(stringr)
## Warning: package 'stringr' was built under R version 3.5.2
# Use the StringR package's str_trim() function to remove leading and trailing spaces
s <- str_trim(s)

Check That It’s Worked

You can check that it’s worked by using print(), str() or identical():

# Print it to see if it has leading/trailing spaces. It doesn't have any spaces now :)
print(s)
## [1] "Male"
# Look at the structure of the string to see if that shows any leading/trailing spaces. It doesn't have any spaces now :)
str(s)
##  chr "Male"
# Compare it with "Male" to see if it matches. Note that it returns TRUE - they now match! :)
identical(s, "Male")
## [1] TRUE