The goal of this tutorial is to learn different ways to avoid the factor variable trap (FVT). The unfactor function will change a vector to character or numeric from factor avoiding the FVT.
library(varhandle)
# In this tutorial we are going to use the iris dataset
# We will count the amount of plants of each Species
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# If we transform one column from numerical to factor
Sepal.Length <- factor(iris$Sepal.Length)
# Let's check the structure of this vector
str(Sepal.Length)
## Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# If we change this vector to numerical we get back the level position and no the real value
Sepal.Length <- as.numeric(Sepal.Length)
# Instead of 4.3, 4.4, 4.5 we get 9, 7, 5, 4
str(Sepal.Length)
## num [1:150] 9 7 5 4 8 12 4 8 2 7 ...
# If we transform one column from numerical to factor
Sepal.Length <- factor(iris$Sepal.Length)
# Let's check the structure of this vector
str(Sepal.Length)
## Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# We use unfactor and directly get a numerical vector with the right values
Sepal.Length <- unfactor(Sepal.Length)
# We check that we have the proper values as numerical
str(Sepal.Length)
## num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# If we unfactor the Species variable we get a character variable
Species <- unfactor(iris$Species)
# The unfactor function understands if a variable is character or numerical
str(Species)
## chr [1:150] "setosa" "setosa" "setosa" "setosa" "setosa" ...
In this tutorial we have learnt how to change from factor to numerical or character avoiding the factor variable trap.