The goal of this tutorial is to learn how to count absolute and relative frequencies of the entries of a vector in a fast way.
# In this tutorial we are going to use the iris dataset
# We will count the amount of plants of each Species
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We can obtain this information doing a summary
summary(iris$Species)
## setosa versicolor virginica
## 50 50 50
str(summary(iris$Species))
## Named int [1:3] 50 50 50
## - attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
# We can also create a table with the frequencies
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
str(table(iris$Species))
## 'table' int [1:3(1d)] 50 50 50
## - attr(*, "dimnames")=List of 1
## ..$ : chr [1:3] "setosa" "versicolor" "virginica"
# If we want a data frame we can always do
data.frame(table(iris$Species))
## Var1 Freq
## 1 setosa 50
## 2 versicolor 50
## 3 virginica 50
# We can do it by hand dividing the frequency by the total number of entries
table(iris$Species)/length(iris$Species)
##
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
# And we can round the result to be more useful
round(table(iris$Species)/length(iris$Species),2)
##
## setosa versicolor virginica
## 0.33 0.33 0.33
# However there is a function that does that for us: prop.table
prop.table(table(iris$Species))
##
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
# And we can round again to get a nicer view of the table
round(prop.table(table(iris$Species)),2)
##
## setosa versicolor virginica
## 0.33 0.33 0.33
In this tutorial we have learnt how to get the absolute and relative frequencies of some values in a vector. This could be very useful if we want to know the frequency of certain values or their weight in the vector.