The goal of this tutorial is to drop empty levels of a factor that are inherited from a previous dataset.
# In this tutorial we are going to use the iris dataset
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We can know how many of each level do we have in the Species factor
str(iris$Species)
## Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
# Imagine that you want to make a selection of all plants except for setosa
iris_sample <- iris[which(iris$Species != "setosa"), ]
# And now we want to know the levels of the factor
levels(iris_sample$Species)
## [1] "setosa" "versicolor" "virginica"
str(iris_sample$Species)
## Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
# And learn how many of each level do we have in our table
table(iris_sample$Species)
##
## setosa versicolor virginica
## 0 50 50
# We have inherited an empty level from a bigger dataset
# The only thing we need to drop empty levels is create a factor with the remaining data
iris_sample$Species <- factor(iris_sample$Species)
# Now the empty level is gone
levels(iris_sample$Species)
## [1] "versicolor" "virginica"
str(iris_sample$Species)
## Factor w/ 2 levels "versicolor","virginica": 1 1 1 1 1 1 1 1 1 1 ...
table(iris_sample$Species)
##
## versicolor virginica
## 50 50
In this tutorial we have learnt how to drop empty levels from a factor. This could be useful if we want to study a sample of a dataset and we don’t need to keep track of those empty levels.