The goal of this tutorial is to use a “mode” function in order to find the most common value within a row or column
# The mode is simply the most frequent value of a sequence, be it numeric or characters. For instance for # a vector such as c(3,5,1,2,3,4,1,5,5,3,2,3) the mode would be 5, for a vector such as c("cat", "dog",
# "mouse", "cat") the mode would be "cat".
# For reasons unknown to the author of this RPub, there is no built-in function for calculating the mode
# in R. But it is very easy to build your own function, here is one example:
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
# Let's test the mode function on the starwars dataset, which comes with the dplyr package
# So first load dplyr ( with library(dplyr) or library(tidyverse) )
# Feel free to have a quick look at the data set first:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
head(starwars)
## # A tibble: 6 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke Sky~ 172 77.0 blond fair blue 19.0 male
## 2 C-3PO 167 75.0 <NA> gold yellow 112 <NA>
## 3 R2-D2 96 32.0 <NA> white, bl~ red 33.0 <NA>
## 4 Darth Va~ 202 136 none white yellow 41.9 male
## 5 Leia Org~ 150 49.0 brown light brown 19.0 female
## 6 Owen Lars 178 120 brown, gr~ light blue 52.0 male
## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# Now we want to see the mode of the column "height"
mode(starwars$height)
## [1] 183
# Which is the correct value, as we can check ourselves by using table:
table(starwars$height)
##
## 66 79 88 94 96 97 112 122 137 150 157 160 163 165 166 167 168 170
## 1 1 1 1 2 1 1 1 1 2 1 1 2 3 1 2 1 4
## 171 172 173 175 177 178 180 182 183 184 185 188 190 191 193 196 198 200
## 1 1 1 3 1 4 5 1 7 1 1 5 1 3 3 3 2 1
## 202 206 213 216 224 228 229 234 264
## 1 2 1 1 1 1 1 1 1
# With 7 occurences 183 is the most frequent value
# Now we want to answer a more pressing question: are there more male or female characters in the entire Star Wars universe?
mode(starwars$gender)
## [1] "male"
# Just to double check, using table on the column "gender" reveals the harsh truth: in a galaxy far, far
# away, there is no gender equality ...
table(starwars$gender)
##
## female hermaphrodite male none
## 19 1 62 2
In this tutorial we have learnt what a mode is, how to create a mode funtion and how to use it to quickly establish the most frequent value within a data sequence.