1 Goal

The goal of this tutorial is to learn different ways to avoid the factor variable trap (FVT). The unfactor function will change a vector to character or numeric from factor avoiding the FVT.

2 Data import

library(varhandle)

# In this tutorial we are going to use the iris dataset
# We will count the amount of plants of each Species
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ##$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ##$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 3 The factor variable trap # If we transform one column from numerical to factor Sepal.Length <- factor(iris$Sepal.Length)

# Let's check the structure of this vector
str(Sepal.Length)
##  Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# If we change this vector to numerical we get back the level position and no the real value
Sepal.Length <- as.numeric(Sepal.Length)

# Instead of 4.3, 4.4, 4.5 we get 9, 7, 5, 4
str(Sepal.Length)
##  num [1:150] 9 7 5 4 8 12 4 8 2 7 ...

4 The unfactor function

4.1 Unfactor numerical variables

# If we transform one column from numerical to factor
Sepal.Length <- factor(iris$Sepal.Length) # Let's check the structure of this vector str(Sepal.Length) ## Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ... # We use unfactor and directly get a numerical vector with the right values Sepal.Length <- unfactor(Sepal.Length) # We check that we have the proper values as numerical str(Sepal.Length) ## num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... 4.2 Unfactor character variables # If we unfactor the Species variable we get a character variable Species <- unfactor(iris$Species)

# The unfactor function understands if a variable is character or numerical
str(Species)
##  chr [1:150] "setosa" "setosa" "setosa" "setosa" "setosa" ...

5 Conclusion

In this tutorial we have learnt how to change from factor to numerical or character avoiding the factor variable trap.