1 Goal


The goal of this tutorial is to drop empty levels of a factor that are inherited from a previous dataset.


2 Data preparation


# In this tutorial we are going to use the iris dataset
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We can know how many of each level do we have in the Species factor
str(iris$Species)
##  Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
table(iris$Species)
## 
##     setosa versicolor  virginica 
##         50         50         50

3 Creating a factor with empty levels


# Imagine that you want to make a selection of all plants except for setosa

iris_sample <- iris[which(iris$Species != "setosa"), ]

# And now we want to know the levels of the factor
levels(iris_sample$Species)
## [1] "setosa"     "versicolor" "virginica"
str(iris_sample$Species)
##  Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
# And learn how many of each level do we have in our table
table(iris_sample$Species)
## 
##     setosa versicolor  virginica 
##          0         50         50

4 Removing empty levels in a factor


# We have inherited an empty level from a bigger dataset
# The only thing we need to drop empty levels is create a factor with the remaining data
iris_sample$Species <- factor(iris_sample$Species)

# Now the empty level is gone
levels(iris_sample$Species)
## [1] "versicolor" "virginica"
str(iris_sample$Species)
##  Factor w/ 2 levels "versicolor","virginica": 1 1 1 1 1 1 1 1 1 1 ...
table(iris_sample$Species)
## 
## versicolor  virginica 
##         50         50

5 Conclusion


In this tutorial we have learnt how to drop empty levels from a factor. This could be useful if we want to study a sample of a dataset and we don’t need to keep track of those empty levels.