Visualizing Categorical Pairs

Harold Nelson

2024-06-10

Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
load("comics.Rdata")

Mosaicplot

This is a base R function, which I prefer to barplots when looking at the relationship between two categorical variables. The areas in the graphic are proportional to the counts in the specified combinations.

comics = comics %>% 
  filter(align != "Reformed Criminals")
mosaicplot(table(comics$publisher,comics$align))

Jitter

This produces clouds of points at the intersection of each row and column. With a lot of points, you need to reduce the size of the points.

comics %>% 
  ggplot(aes(x = publisher, y = align)) +
  geom_jitter(size = .001)

Count

This geom puts a circle at each intersection. The size of the circle is poroportional to the count of items at the intersection.

comics %>% 
  ggplot(aes(x = publisher, y = align)) +
  geom_count()