SmπŸ„°rt plotting with πŸ…€πŸ…πŸ…˜πŸ…’πŸ…žπŸ…“πŸ…”

Author

James Silva Garcia

Published

March 30, 2023

Welcome

Thank you very much for reading this post. Today I want to share some tips for neatly designing creative visualizations using ggplot2. I make strong emphasis on taking full control of color, theme layout, and other graphical components; I certainly believe that we should do as more as we can to have readers being able to easily interpret visualizations. Proper selection of colors to convey psychological meaning (like using green and red for hope and danger, respectively), always remembering that some readers might be colorblind (suggesting the use of colorblind-friendly color coding), and smartly selecting Unicode plotting characters (see more at Unicode search) for enhancing communication are some of the characteristics of coding for smart visuals.

Colorblind-friendly + Unicode, a great combo!

I always use the RStudio IDE to write visualization code. One of the great features of this IDE is that after typing a color name or code the IDE previews how the color would look like as illustrated below

Hide code
colorblindFriendly = c('#1B9E77','#D95F02','#7570B3','#E7298A','#66A61E','#E6AB02','#A6761D','#666666',
                                '#66C2A5','#FC8D62','#8DA0CB','#E78AC3','#A6D854','#FFD92F','#E5C494','#B3B3B3',
                                '#A6CEE3','#1F78B4','#B2DF8A','#33A02C','#FB9A99','#E31A1C','#FDBF6F','#FF7F00',
                                '#CAB2D6','#6A3D9A','#FFFF99','#B15928','gray15')

Using colorblind-friendly alternatives is much better than letting ggplot2 to use default colors when producing visualizations. Likewise, it is extremely easy and very powerful to use Unicode characters in R. For example, by calling intToUtf8 inside the paste0 function we can produce a smart Unicode annotation that looks like

Hide code
paste0('Sm', intToUtf8(127280),'rt ',intToUtf8(c(127332,127325,127320,127314,127326,127315,127316)))
[1] "SmπŸ„°rt πŸ…€πŸ…πŸ…˜πŸ…’πŸ…žπŸ…“πŸ…”"

Typical mtcars visualization example

mtcars is one of the popular data sets shared in R and is frequently used to illustrate ggplot2 features. One of the typical on-the-fly examples is reproduced below

Hide code
ggsave("images/mtcarsFig.png",
       mtcars %>%  
  ggplot(aes(x = cyl, y = mpg, group = cyl, color = factor(cyl))) +
  geom_boxplot() +
  geom_jitter() +
  ggtitle("A typical example plot with default features"),
       width=7.5, height=7.5/1.618, units="in"
)

As we can see, we let ggplot2 use default options for colors and plotting characters, and did not make use of theme() alternatives to make our visualization neater.

Enhanced mtcars visualization

We are in the hunt for three colors and three plotting characters to represent the 4, 6, and 8 cyl data groups. Double checking the color values listed inside the colorblindFriendly object, we realize that the first three colors are a harmonious triad of secondary colors (i.e., green-orange-purple, a very popular triad in ads and cartoons); likewise, Unicode characters includes a set of solid circled numbers we can use to reference our cyl groups

Hide code
paste(intToUtf8(c(10105, 10107, 10109)))
[1] "❹❻❽"

Combining color and Unicode features, our annotated plotting characters become ❹, ❻, and ❽.

Next, we can store our favorite theme options (for controlling other appearance features of our visualization) by executing

Hide code
my_theme = theme_classic() +
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        panel.grid.major.x=element_blank(),
        panel.grid.major.y=element_line(linetype="dashed", linewidth=0.1, color="gray20"),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        text=element_text(size=12),
        plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        legend.box.spacing=unit(0, "pt"),
        legend.margin=margin(0,0,0,0),
        legend.spacing.x = unit(0.025,"cm"),
        legend.box="vertical",
        legend.spacing.y = unit(-0.1,"cm"),
        legend.position="bottom")

Now, we are ready to produce a much neater visualization (the real trick is to use a negative sign before the array values for scale_shape_manual())

Hide code
UNICODE = intToUtf8(c(127332,127325,127320,127314,127326,127315,127316))
mtcars = mtcars %>% mutate(cyl=factor(cyl))
ggsave("images/mtcarsEnhancedFig.png",
ggplot(data=mtcars, aes(x=cyl, y=mpg)) +
  geom_boxplot(aes(fill=cyl), outlier.shape=NA, color="gray50") +
  stat_boxplot(geom='errorbar', color="gray50", width=0.5) +
  scale_fill_manual(values=rep("gray95",nlevels(mtcars$cyl)),guide="none") +
  geom_jitter(size=4, width=0.25, height=0, aes(color=cyl, shape=cyl)) +
  scale_color_manual(name="Cyl",values=c("#1B9E77","#D95F02","#7570B3")) +
  scale_shape_manual(name="Cyl",values=-c(10105,10107,10109)) +
  ggtitle(paste0("Enhanced plot with colorblind friendly design and ", UNICODE, " symbols")) +
  my_theme ,
       width=7.5, height=7.5/1.618, units="in"
)

Additional examples: plotting the Palmer Penguins 🐧🐧🐧

penguins is a data set shared within the palmerpenguins R package. Let’s start by reproducing on visualization example found online

Hide code
library(palmerpenguins)
penguins = na.omit(penguins)

ggsave("images/penguinsFig0.png",
       ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +
  geom_point(aes(color=species, shape=species), size=3, alpha=0.8) +
  scale_color_manual(values = c("darkorange","purple","cyan4")) +
  labs(title = "Penguin size, Palmer Station LTER",
       subtitle = "Flipper length and body mass for Adelie, Chinstrap, and Gentoo Penguins",
       x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin species",
       shape = "Penguin species") +
  theme(legend.position = c(0.2, 0.7),
        plot.title.position = "plot",
        plot.caption = element_text(hjust = 0, face= "italic"),
        plot.caption.position = "plot"),
       width=7.5, height=7.5/1.618, units="in"
)

Let’s workout a better enhancement for this this plot. Firstly, we notice that the initial of each penguin species name (i.e, A, C, and G) is different from each other; this suggest using the Unicode characters πŸ…πŸ…’πŸ…–. Secondly, we can map penguin gender using colors: female and male. Lastly, we can also run the following code to estimate the Species-Gender group mean values for the two variables being plotted and then overlay larger symbols to denote average values

Hide code
sgmeans = penguins %>% 
  group_by(species, sex) %>% 
  summarise(flipper_length_mm=mean(flipper_length_mm, na.rm=T),
            body_mass_g=mean(body_mass_g, na.rm=T)) %>% 
  ungroup()

After updating our theme() options, we are ready to craft our first enhanced penguins visualization

Hide code
my_theme = theme_classic() +
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        panel.grid.major.x=element_blank(),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        text=element_text(size=12),
        plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        legend.box.spacing=unit(0, "pt"),
        legend.margin=margin(5,0,0,0),
        legend.spacing.x = unit(0.025,"cm"),
        legend.box="vertical",
        legend.spacing.y = unit(0,"cm"),
        legend.position="bottom")
ggsave("images/penguinsFig1.png",
       ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +
  geom_jitter(aes(color=sex, shape=species), width=0.25, height=0, size=2, alpha=0.40) +
  geom_point(data=sgmeans, aes(color=sex, shape=species), size=4, show.legend=F) +
  scale_color_manual(name="Gender",values=c("#E7298A","#1B9E77")) +
  scale_shape_manual(name="Species",values=-c(127312,127314,127318)) +
  labs(title = "Penguin size, Palmer Station LTER",
       subtitle = "Flipper length and body mass for Species and Gender",
       x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin species",
       shape = "Penguin species") +
  guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) +
  my_theme,
       width=7.5, height=7.5/1.618, units="in"
)

Other Unicode characters useful to map gender

When mapping gender onto visualizations, we can use the standard Unicode characters ♀ and β™‚ to represent female and male, respectively. Alternatively, for lovers of the chess game, we can also borrow the Unicode characters β™› and β™”. Moreover, when mapping the two values of a dichotomous factor we can use an empty Unicode character for one group, and the corresponding solid Unicode character for the other group (e.g., β™˜ = group 1; β™ž = group 2).

Let’s get an enhanced scatter plot grouping by Island and Gender using standard gender symbols, with islands mapped to colors.

Hide code
penguinsPlot = ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +
    geom_point(aes(color=island, shape=sex), size=3) +
    scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +
    scale_shape_manual(name="Species",values=-c(9792,9794)) +
    labs(title = "Penguin size, Palmer Station LTER",
         subtitle = "Flipper length and body mass for Island and Gender",
         y = "Flipper length (mm)",
         x = "Body mass (g)",
         color = "Island",
         shape = "Gender") +
    guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) +
    my_theme
ggsave("images/penguinsFig2.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in"
)

To add color contrast to improve our visualization, we can include one more ggplot layer to overlay an island-colored solid circle using the Unicode character ●

Hide code
penguinsPlot = penguinsPlot +
    geom_point(aes(color=island), shape=-9679, size=6.5, alpha=0.15)
ggsave("images/penguinsFig3.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in"
)

The code and output for chess lovers would look like

Hide code
penguinsPlot = ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +
    geom_point(aes(color=island, shape=sex), size=2.5) +
    scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +
    scale_shape_manual(name="Species",values=-c(9819,9812)) +
    labs(title = "Penguin size, Palmer Station LTER",
         subtitle = "Flipper length and body mass for Island and Gender",
         y = "Flipper length (mm)",
         x = "Body mass (g)",
         color = "Island",
         shape = "Gender") +
    guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) +
    my_theme
ggsave("images/penguinsFig4.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in"
)

Smart Unicode annotation for 3 factors

On this final example I want to illustrate one way to map 3 factors to construct a neat visualization; this time, colors will be used to map penguin species. Next, we need to add the IslandGender column to the penguins data frame by running

Hide code
penguins = penguins %>% 
  mutate(IslandGender = paste0(island, "_", sex))

Solid circled letters can be used to represent female penguins, while empty squared letters can be used to represent male penguins. The final penguins visualization example is displayed below.

Hide code
igShapes = c(127313,127281,127315,127283,127331,127299)
penguinsPlot = ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +
    geom_point(aes(color=species, shape=IslandGender), size=2) +
    scale_color_manual(name="Species",values=c("#1B9E77","#D95F02","#6A3D9A")) +
    scale_shape_manual(name="Island_Gender",values=-igShapes) +
    labs(title = "Penguin size, Palmer Station LTER",
         subtitle = "Flipper length and body mass for Island and Gender",
         y = "Flipper length (mm)",
         x = "Body mass (g)",
         color = "Island",
         shape = "Gender") +
    guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) +
    my_theme
ggsave("images/penguinsFig5.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in"
)

Closing notes

If reading this post has inspired some creative ideas, then that only means my goal was met. I want to close this post sharing some of the technical details of the tools used to elaborate this document.

The code to obtain the HTML version of this document was written using the RStudio 2023.03.0-386 IDE (Windows 10/11 Zip/Tarballs available for free at https://posit.co/download/rstudio-desktop/). I used the features of a next generation Rmarkdown tool called Quarto; before Quarto, I was always scared by the complex look of Rmarkdown code, but the Visual mode of RStudio was extremely useful for me to start using it more.

In order to be able to produce output using the Render button of RStudio’s IDE, I had to make a minor change on one of the internal auxiliary R scripts (called execute.R) stored in the extracted folder. The path to it is displayed below

Using Window’s WordPad, I edited line number 12

Hide code
oldwd <- setwd(dirname(rmarkdown:::abs_path(input)))

To replace it with

Hide code
oldwd <- setwd(getwd())

After saving this change, the Render button started working as expected.

I encourage you to stay in touch and share your opinios.

Enjoy!!!