1 Goal


The goal of this tutorial is to use a “mode” function in order to find the most common value within a row or column


2 Mode Function


# The mode is simply the most frequent value of a sequence, be it numeric or characters. For instance for # a vector such as  c(3,5,1,2,3,4,1,5,5,3,2,3) the mode would be 5, for a vector such as c("cat", "dog",
# "mouse", "cat") the mode would be "cat".

# For reasons unknown to the author of this RPub, there is no built-in function for calculating the mode 
# in R. But it is very easy to build your own function, here is one example: 

mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

3 Using mode to get the most frequent value


# Let's test the mode function on the starwars dataset, which comes with the dplyr package
# So first load dplyr ( with library(dplyr) or library(tidyverse) ) 
# Feel free to have a quick look at the data set first:
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
head(starwars)
## # A tibble: 6 x 13
##   name      height  mass hair_color skin_color eye_color birth_year gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Luke Sky~    172  77.0 blond      fair       blue            19.0 male  
## 2 C-3PO        167  75.0 <NA>       gold       yellow         112   <NA>  
## 3 R2-D2         96  32.0 <NA>       white, bl~ red             33.0 <NA>  
## 4 Darth Va~    202 136   none       white      yellow          41.9 male  
## 5 Leia Org~    150  49.0 brown      light      brown           19.0 female
## 6 Owen Lars    178 120   brown, gr~ light      blue            52.0 male  
## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# Now we want to see the mode of the column "height"

mode(starwars$height)
## [1] 183
# Which is the correct value, as we can check ourselves by using table:

table(starwars$height)
## 
##  66  79  88  94  96  97 112 122 137 150 157 160 163 165 166 167 168 170 
##   1   1   1   1   2   1   1   1   1   2   1   1   2   3   1   2   1   4 
## 171 172 173 175 177 178 180 182 183 184 185 188 190 191 193 196 198 200 
##   1   1   1   3   1   4   5   1   7   1   1   5   1   3   3   3   2   1 
## 202 206 213 216 224 228 229 234 264 
##   1   2   1   1   1   1   1   1   1
# With 7 occurences 183 is the most frequent value
# Now we want to answer a more pressing question: are there more male or female characters in the entire Star Wars universe?

mode(starwars$gender)
## [1] "male"
# Just to double check, using table on the column "gender" reveals the harsh truth: in a galaxy far, far
# away, there is no gender equality ...

table(starwars$gender)
## 
##        female hermaphrodite          male          none 
##            19             1            62             2

4 Conclusion


In this tutorial we have learnt what a mode is, how to create a mode funtion and how to use it to quickly establish the most frequent value within a data sequence.