1 Goal


The goal of this tutorial is to avoid one common mistake related to the use of factors. When trying to transform a factor containing numbers to numerical value we obtain as a result the position of the levels instead of the content of the variable. We will see how to find this problem and check that everything went fine.


2 Data preparation


# In this exercise we will use a character vector containing numbers
# We will use the iris dataset to perform this exercise
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

3 Turning a character vector into numerical


# We create a character vector using the Sepal Lenght variable
char_vector <- as.character(iris$Sepal.Length)
str(char_vector)
##  chr [1:150] "5.1" "4.9" "4.7" "4.6" "5" "5.4" "4.6" ...
# We create a numerical vector from the character vector
num_vector <- as.numeric(char_vector)
str(num_vector)
##  num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# We plot the difference that should be zero if the value is correctly saved
plot(num_vector - iris$Sepal.Length)

# A plot consisting of zeroes confirms that the transformation was correclty made

4 Turning a factor into a numerical vector


# We create a factor type variable 
my_factor <- factor(iris$Sepal.Length)
str(my_factor)
##  Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
# Now we save in a new variable the numerical values inside the factor
num_vector <- as.numeric(my_factor)
str(num_vector)
##  num [1:150] 9 7 5 4 8 12 4 8 2 7 ...
# We plot the difference that should be zero if the value is correctly saved
plot(num_vector - iris$Sepal.Length)