1 Goal


The goal of this tutorial is to learn how to change specific names to more general categories read from a different table. This process can be useful when we want to make analysis by category instead of by individual products.


2 Preparing the data


# First of all we load the data
# For this tutorial we are going to use the iris plant dataset
data(iris)
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# First we add a new column called name which is just the row number
iris$name <- 1:nrow(iris)
iris <- iris[ , c(ncol(iris), 1:(ncol(iris) -1))]
str(iris)
## 'data.frame':    150 obs. of  6 variables:
##  $ name        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Now we create a table containing just the name and the Species
iris_Species <- iris[ c(1, ncol(iris))]
iris_Species$Species <- as.character(iris_Species$Species)
str(iris_Species)
## 'data.frame':    150 obs. of  2 variables:
##  $ name   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Species: chr  "setosa" "setosa" "setosa" "setosa" ...
# And remove the Species from the original table
iris$Species <- NULL
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ name        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

3 Reading category and assigning to specific names


# Now we can tell them to put the proper species into the name
iris_Species[1, "Species"]
## [1] "setosa"
iris$name <- iris_Species[iris$name, "Species"]
head(iris)
##     name Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 setosa          5.1         3.5          1.4         0.2
## 2 setosa          4.9         3.0          1.4         0.2
## 3 setosa          4.7         3.2          1.3         0.2
## 4 setosa          4.6         3.1          1.5         0.2
## 5 setosa          5.0         3.6          1.4         0.2
## 6 setosa          5.4         3.9          1.7         0.4
# Now we have changed the name of the specific plant with its own species

4 Conclusion


In this tutorial we have learnt how to change the specific name of an entry by its larger category from a different table.