In this hands-on exercise, you will learn how to visualising and analysing multivariable categorical data using mosaic plot data visualisation technique.
For this exercise, the vcd package of R will be used. You are required to install vdc package if is has yet to be installed in your computer. You are also need to ensure that the tidyverse family packages are available for this hands-on exercise.
The code chunks below will accomplish the task.
packages = c('vcd', 'vcdExtra', 'tidyverse')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
In this hands-on exercise, the famour Titanic data set will be used.
First, important the data into R by using the code below.
titanic <- read_csv("data/Titanic.csv")
Notice that the variables are in categorical data type.
A mosaic plot is a visual representation of the association between two or more categorical variables.
A mosaic plot is an area-proportional visualisation of observed frequencies, composed of tiles (corresponding to the cells in the contigency table) created by recursive vertical and horizontal splits of a rectangle. The area of each tile is proportional to the corresponding cell entry
## Age Adult Child
## Sex
## Female 425 45
## Male 1667 64
mosaic(~ Sex + Age, data = titanic)
## Age Adult Child
## Sex Survived
## Female No 109 17
## Yes 316 28
## Male No 1329 35
## Yes 338 29
mosaic(~ Sex + Age + Survived, data = titanic)
mosaic(Survived ~ Sex, data = titanic)
mosaic(Survived ~ Sex + Age, data = titanic)
mosaic(~ Sex + Survived, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
## Sex Female Male
## Age Survived
## Adult No 109 1329
## Yes 316 338
## Child No 17 35
## Yes 28 29
mosaic(~ Age + Sex + Survived, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
## Sex Female Male
## Survived Age
## No Adult 109 1329
## Child 17 35
## Yes Adult 316 338
## Child 28 29
mosaic(~ Survived + Sex + Age, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
## Survived No Yes
## Sex Age
## Female Adult 109 316
## Child 17 28
## Male Adult 1329 338
## Child 35 29
mosaic(~ Sex + Survived + Age, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
## Sex Female Male
## Class Crew First Second Third Crew First Second Third
## Survived Age
## No Adult 3 4 13 89 670 118 154 387
## Child 0 0 0 17 0 0 0 35
## Yes Adult 20 140 80 76 192 57 14 75
## Child 0 1 13 14 0 5 11 13
mosaic(~ Survived + Age + Sex + Class, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)