1 Goal


The goal of this tutorial is to learn different ways to avoid the factor variable trap (FVT). The unfactor function will change a vector to character or numeric from factor avoiding the FVT.


2 Data import


library(varhandle)

# In this tutorial we are going to use the iris dataset
# We will count the amount of plants of each Species
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

3 The factor variable trap


# If we transform one column from numerical to factor
Sepal.Length <- factor(iris$Sepal.Length)

# Let's check the structure of this vector
str(Sepal.Length)
##  Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# If we change this vector to numerical we get back the level position and no the real value
Sepal.Length <- as.numeric(Sepal.Length)

# Instead of 4.3, 4.4, 4.5 we get 9, 7, 5, 4
str(Sepal.Length)
##  num [1:150] 9 7 5 4 8 12 4 8 2 7 ...

4 The unfactor function

4.1 Unfactor numerical variables


# If we transform one column from numerical to factor
Sepal.Length <- factor(iris$Sepal.Length)

# Let's check the structure of this vector
str(Sepal.Length)
##  Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# We use unfactor and directly get a numerical vector with the right values
Sepal.Length <- unfactor(Sepal.Length)

# We check that we have the proper values as numerical
str(Sepal.Length)
##  num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

4.2 Unfactor character variables


# If we unfactor the Species variable we get a character variable
Species <- unfactor(iris$Species)

# The unfactor function understands if a variable is character or numerical
str(Species)
##  chr [1:150] "setosa" "setosa" "setosa" "setosa" "setosa" ...

5 Conclusion


In this tutorial we have learnt how to change from factor to numerical or character avoiding the factor variable trap.