Overview

In this hands-on exercise, you will learn how to visualising and analysing multivariable categorical data using mosaic plot data visualisation technique.

Installing and Launching R Packages

For this exercise, the vcd package of R will be used. You are required to install vdc package if is has yet to be installed in your computer. You are also need to ensure that the tidyverse family packages are available for this hands-on exercise.

The code chunks below will accomplish the task.

packages = c('vcd', 'vcdExtra', 'tidyverse')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Data Preparation

In this hands-on exercise, the famour Titanic data set will be used.

Importing Data

First, important the data into R by using the code below.

titanic <- read_csv("data/Titanic.csv")

Notice that the variables are in categorical data type.

Mosaic Plot

A mosaic plot is a visual representation of the association between two or more categorical variables.

A mosaic plot is an area-proportional visualisation of observed frequencies, composed of tiles (corresponding to the cells in the contigency table) created by recursive vertical and horizontal splits of a rectangle. The area of each tile is proportional to the corresponding cell entry

Plotting with the basic mosaic function

Visualising two-way table

##        Age Adult Child
## Sex                   
## Female       425    45
## Male        1667    64
mosaic(~ Sex + Age, data = titanic)

Visualising three-way table

##                 Age Adult Child
## Sex    Survived                
## Female No             109    17
##        Yes            316    28
## Male   No            1329    35
##        Yes            338    29
mosaic(~ Sex + Age + Survived, data = titanic)

Working with highlighting

Visualising two-way table

mosaic(Survived ~ Sex, data = titanic)

Visualising three way table

mosaic(Survived ~ Sex + Age, data = titanic)

Working with shade and legend arguments

Visualising two-way table

mosaic(~ Sex + Survived, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)

Visualising three-way table

##                Sex Female Male
## Age   Survived                
## Adult No              109 1329
##       Yes             316  338
## Child No               17   35
##       Yes              28   29
mosaic(~ Age + Sex + Survived, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)

##                Sex Female Male
## Survived Age                  
## No       Adult        109 1329
##          Child         17   35
## Yes      Adult        316  338
##          Child         28   29
mosaic(~ Survived + Sex + Age, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)

##              Survived   No  Yes
## Sex    Age                     
## Female Adult           109  316
##        Child            17   28
## Male   Adult          1329  338
##        Child            35   29
mosaic(~ Sex + Survived + Age, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)

Visualising four-way table

##                Sex   Female                    Male                   
##                Class   Crew First Second Third Crew First Second Third
## Survived Age                                                          
## No       Adult            3     4     13    89  670   118    154   387
##          Child            0     0      0    17    0     0      0    35
## Yes      Adult           20   140     80    76  192    57     14    75
##          Child            0     1     13    14    0     5     11    13
mosaic(~ Survived + Age + Sex + Class, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)