Thank you very much for reading this post. Today I want to share some tips for neatly designing creative visualizations using ggplot2. I make strong emphasis on taking full control of color, theme layout, and other graphical components; I certainly believe that we should do as more as we can to have readers being able to easily interpret visualizations. Proper selection of colors to convey psychological meaning (like using green and red for hope and danger, respectively), always remembering that some readers might be colorblind (suggesting the use of colorblind-friendly color coding), and smartly selecting Unicode plotting characters (see more at Unicode search) for enhancing communication are some of the characteristics of coding for smart visuals.
Colorblind-friendly + Unicode, a great combo!
I always use the RStudio IDE to write visualization code. One of the great features of this IDE is that after typing a color name or code the IDE previews how the color would look like as illustrated below
Using colorblind-friendly alternatives is much better than letting ggplot2 to use default colors when producing visualizations. Likewise, it is extremely easy and very powerful to use Unicode characters in R. For example, by calling intToUtf8 inside the paste0 function we can produce a smart Unicode annotation that looks like
mtcars is one of the popular data sets shared in R and is frequently used to illustrate ggplot2 features. One of the typical on-the-fly examples is reproduced below
Hide code
ggsave("images/mtcarsFig.png", mtcars %>%ggplot(aes(x = cyl, y = mpg, group = cyl, color =factor(cyl))) +geom_boxplot() +geom_jitter() +ggtitle("A typical example plot with default features"),width=7.5, height=7.5/1.618, units="in")
As we can see, we let ggplot2 use default options for colors and plotting characters, and did not make use of theme() alternatives to make our visualization neater.
Enhanced mtcars visualization
We are in the hunt for three colors and three plotting characters to represent the 4, 6, and 8 cyl data groups. Double checking the color values listed inside the colorblindFriendly object, we realize that the first three colors are a harmonious triad of secondary colors (i.e., green-orange-purple, a very popular triad in ads and cartoons); likewise, Unicode characters includes a set of solid circled numbers we can use to reference our cyl groups
Hide code
paste(intToUtf8(c(10105, 10107, 10109)))
[1] "βΉβ»β½"
Combining color and Unicode features, our annotated plotting characters become βΉ, β», and β½.
Next, we can store our favorite theme options (for controlling other appearance features of our visualization) by executing
Additional examples: plotting the Palmer Penguins π§π§π§
penguins is a data set shared within the palmerpenguins R package. Letβs start by reproducing on visualization example found online
Hide code
library(palmerpenguins)penguins =na.omit(penguins)ggsave("images/penguinsFig0.png",ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +geom_point(aes(color=species, shape=species), size=3, alpha=0.8) +scale_color_manual(values =c("darkorange","purple","cyan4")) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Adelie, Chinstrap, and Gentoo Penguins",x ="Flipper length (mm)",y ="Body mass (g)",color ="Penguin species",shape ="Penguin species") +theme(legend.position =c(0.2, 0.7),plot.title.position ="plot",plot.caption =element_text(hjust =0, face="italic"),plot.caption.position ="plot"),width=7.5, height=7.5/1.618, units="in")
Letβs workout a better enhancement for this this plot. Firstly, we notice that the initial of each penguin species name (i.e, A, C, and G) is different from each other; this suggest using the Unicode characters π π π . Secondly, we can map penguin gender using colors: female and male. Lastly, we can also run the following code to estimate the Species-Gender group mean values for the two variables being plotted and then overlay larger symbols to denote average values
After updating our theme() options, we are ready to craft our first enhanced penguins visualization
Hide code
my_theme =theme_classic() +theme(axis.line.x=element_line(color="gray20", linewidth=1.0),axis.line.y=element_line(color="gray20", linewidth=0.5),panel.grid.major.x=element_blank(),panel.grid.minor=element_blank(), panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),panel.background =element_rect(colour ="gray50", linewidth=1.0),text=element_text(size=12),plot.title=element_text(hjust=0.5),plot.subtitle=element_text(hjust=0.5),legend.box.spacing=unit(0, "pt"),legend.margin=margin(5,0,0,0),legend.spacing.x =unit(0.025,"cm"),legend.box="vertical",legend.spacing.y =unit(0,"cm"),legend.position="bottom")ggsave("images/penguinsFig1.png",ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +geom_jitter(aes(color=sex, shape=species), width=0.25, height=0, size=2, alpha=0.40) +geom_point(data=sgmeans, aes(color=sex, shape=species), size=4, show.legend=F) +scale_color_manual(name="Gender",values=c("#E7298A","#1B9E77")) +scale_shape_manual(name="Species",values=-c(127312,127314,127318)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Species and Gender",x ="Flipper length (mm)",y ="Body mass (g)",color ="Penguin species",shape ="Penguin species") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_theme,width=7.5, height=7.5/1.618, units="in")
Other Unicode characters useful to map gender
When mapping gender onto visualizations, we can use the standard Unicode characters β and β to represent female and male, respectively. Alternatively, for lovers of the chess game, we can also borrow the Unicode characters β and β. Moreover, when mapping the two values of a dichotomous factor we can use an empty Unicode character for one group, and the corresponding solid Unicode character for the other group (e.g., β = group 1; β = group 2).
Letβs get an enhanced scatter plot grouping by Island and Gender using standard gender symbols, with islands mapped to colors.
Hide code
penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=island, shape=sex), size=3) +scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Species",values=-c(9792,9794)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig2.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")
To add color contrast to improve our visualization, we can include one more ggplot layer to overlay an island-colored solid circle using the Unicode character β
The code and output for chess lovers would look like
Hide code
penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=island, shape=sex), size=2.5) +scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Species",values=-c(9819,9812)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig4.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")
Smart Unicode annotation for 3 factors
On this final example I want to illustrate one way to map 3 factors to construct a neat visualization; this time, colors will be used to map penguin species. Next, we need to add the IslandGender column to the penguins data frame by running
Solid circled letters can be used to represent female penguins, while empty squared letters can be used to represent male penguins. The final penguins visualization example is displayed below.
Hide code
igShapes =c(127313,127281,127315,127283,127331,127299)penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=species, shape=IslandGender), size=2) +scale_color_manual(name="Species",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Island_Gender",values=-igShapes) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig5.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")
Closing notes
If reading this post has inspired some creative ideas, then that only means my goal was met. I want to close this post sharing some of the technical details of the tools used to elaborate this document.
The code to obtain the HTML version of this document was written using the RStudio 2023.03.0-386 IDE (Windows 10/11 Zip/Tarballs available for free at https://posit.co/download/rstudio-desktop/). I used the features of a next generation Rmarkdown tool called Quarto; before Quarto, I was always scared by the complex look of Rmarkdown code, but the Visual mode of RStudio was extremely useful for me to start using it more.
In order to be able to produce output using the Render button of RStudioβs IDE, I had to make a minor change on one of the internal auxiliary R scripts (called execute.R) stored in the extracted folder. The path to it is displayed below
After saving this change, the Render button started working as expected.
I encourage you to stay in touch and share your opinios.
Enjoy!!!
Source Code
---title: "`r paste0('Sm', intToUtf8(127280),'rt')` plotting with `r intToUtf8(c(127332,127325,127320,127314,127326,127315,127316))`" author: James Silva Garciadate: "`r Sys.Date()`"format: html: code-tools: true code-fold: show code-link: true code-summary: "Hide code" code-copy: true code-line-numbers: true link-external-newwindow: truelink-citations: trueexecute: eval: true include: true warning: falsetoc: truetoc-depth: 2toc-location: bodytoc-title: Contentsimage: images/unicodePlottingImage.png---```{r}#| include: falselibrary(tidyverse)#--- set.seed() needed to get plots with consistent jittered points when using geom_jitter()set.seed(37676)```## WelcomeThank you very much for reading this post. Today I want to share some tips for neatly designing creative visualizations using ggplot2. I make strong emphasis on taking full control of color, theme layout, and other graphical components; I certainly believe that we should do as more as we can to have readers being able to easily interpret visualizations. Proper selection of colors to convey psychological meaning (like using [**green**]{style="color:#33A02C"} and [**red**]{style="color:#E31A1C"} for [**hope**]{style="color:#33A02C"} and [**danger**]{style="color:#E31A1C"}, respectively), always remembering that some readers might be colorblind (suggesting the use of colorblind-friendly color coding), and smartly selecting Unicode plotting characters (see more at [Unicode search](http://xahlee.info/comp/unicode_index.html?q=)) for enhancing communication are some of the characteristics of coding for smart visuals.## Colorblind-friendly + Unicode, a great combo!I always use the RStudio IDE to write visualization code. One of the great features of this IDE is that after typing a color name or code the IDE previews how the color would look like as illustrated below```{r}#| eval: falsecolorblindFriendly =c('#1B9E77','#D95F02','#7570B3','#E7298A','#66A61E','#E6AB02','#A6761D','#666666','#66C2A5','#FC8D62','#8DA0CB','#E78AC3','#A6D854','#FFD92F','#E5C494','#B3B3B3','#A6CEE3','#1F78B4','#B2DF8A','#33A02C','#FB9A99','#E31A1C','#FDBF6F','#FF7F00','#CAB2D6','#6A3D9A','#FFFF99','#B15928','gray15')```Using colorblind-friendly alternatives is much better than letting ggplot2 to use default colors when producing visualizations. Likewise, it is extremely easy and very powerful to use Unicode characters in R. For example, by calling intToUtf8 inside the paste0 function we can produce a smart Unicode annotation that looks like```{r}paste0('Sm', intToUtf8(127280),'rt ',intToUtf8(c(127332,127325,127320,127314,127326,127315,127316)))```## Typical mtcars visualization example**mtcars** is one of the popular data sets shared in R and is frequently used to illustrate ggplot2 features. One of the typical on-the-fly examples is reproduced below```{r links}ggsave("images/mtcarsFig.png", mtcars %>%ggplot(aes(x = cyl, y = mpg, group = cyl, color =factor(cyl))) +geom_boxplot() +geom_jitter() +ggtitle("A typical example plot with default features"),width=7.5, height=7.5/1.618, units="in")```As we can see, we let ggplot2 use default options for colors and plotting characters, and did not make use of theme() alternatives to make our visualization neater.## Enhanced mtcars visualizationWe are in the hunt for three colors and three plotting characters to represent the 4, 6, and 8 **cyl** data groups. Double checking the color values listed inside the colorblindFriendly object, we realize that the first three colors are a harmonious triad of secondary colors (i.e., [**green**]{style="color:#1B9E77"}-[**orange**]{style="color:#D95F02"}-[**purple**]{style="color:#7570B3"}, a very popular triad in ads and cartoons); likewise, Unicode characters includes a set of solid circled numbers we can use to reference our **cyl** groups```{r}paste(intToUtf8(c(10105, 10107, 10109)))```Combining color and Unicode features, our annotated plotting characters become [**`r intToUtf8(10105)`**]{style="color:#1B9E77"}, [**`r intToUtf8(10107)`**]{style="color:#D95F02"}, and [**`r intToUtf8(10109)`**]{style="color:#7570B3"}.Next, we can store our favorite theme options (for controlling other appearance features of our visualization) by executing```{r}my_theme =theme_classic() +theme(axis.line.x=element_line(color="gray20", linewidth=1.0),axis.line.y=element_line(color="gray20", linewidth=0.5),panel.grid.major.x=element_blank(),panel.grid.major.y=element_line(linetype="dashed", linewidth=0.1, color="gray20"),panel.grid.minor=element_blank(), panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),panel.background =element_rect(colour ="gray50", linewidth=1.0),text=element_text(size=12),plot.title=element_text(hjust=0.5),plot.subtitle=element_text(hjust=0.5),legend.box.spacing=unit(0, "pt"),legend.margin=margin(0,0,0,0),legend.spacing.x =unit(0.025,"cm"),legend.box="vertical",legend.spacing.y =unit(-0.1,"cm"),legend.position="bottom")```Now, we are ready to produce a much neater visualization (the real trick is to use a negative sign before the array values for scale_shape_manual())```{r}UNICODE =intToUtf8(c(127332,127325,127320,127314,127326,127315,127316))mtcars = mtcars %>%mutate(cyl=factor(cyl))ggsave("images/mtcarsEnhancedFig.png",ggplot(data=mtcars, aes(x=cyl, y=mpg)) +geom_boxplot(aes(fill=cyl), outlier.shape=NA, color="gray50") +stat_boxplot(geom='errorbar', color="gray50", width=0.5) +scale_fill_manual(values=rep("gray95",nlevels(mtcars$cyl)),guide="none") +geom_jitter(size=4, width=0.25, height=0, aes(color=cyl, shape=cyl)) +scale_color_manual(name="Cyl",values=c("#1B9E77","#D95F02","#7570B3")) +scale_shape_manual(name="Cyl",values=-c(10105,10107,10109)) +ggtitle(paste0("Enhanced plot with colorblind friendly design and ", UNICODE, " symbols")) + my_theme ,width=7.5, height=7.5/1.618, units="in")```## Additional examples: plotting the Palmer Penguins `r intToUtf8(rep(128039,3))`**penguins** is a data set shared within the palmerpenguins R package. Let's start by reproducing on visualization example found online```{r}library(palmerpenguins)penguins =na.omit(penguins)ggsave("images/penguinsFig0.png",ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +geom_point(aes(color=species, shape=species), size=3, alpha=0.8) +scale_color_manual(values =c("darkorange","purple","cyan4")) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Adelie, Chinstrap, and Gentoo Penguins",x ="Flipper length (mm)",y ="Body mass (g)",color ="Penguin species",shape ="Penguin species") +theme(legend.position =c(0.2, 0.7),plot.title.position ="plot",plot.caption =element_text(hjust =0, face="italic"),plot.caption.position ="plot"),width=7.5, height=7.5/1.618, units="in")```Let's workout a better enhancement for this this plot. Firstly, we notice that the initial of each penguin species name (i.e, A, C, and G) is different from each other; this suggest using the Unicode characters `r intToUtf8(c(127312,127314,127318))`. Secondly, we can map penguin gender using colors: [**female**]{style="color:#E7298A"} and [**male**]{style="color:#1B9E77"}. Lastly, we can also run the following code to estimate the Species-Gender group mean values for the two variables being plotted and then overlay larger symbols to denote average values```{r}sgmeans = penguins %>%group_by(species, sex) %>%summarise(flipper_length_mm=mean(flipper_length_mm, na.rm=T),body_mass_g=mean(body_mass_g, na.rm=T)) %>%ungroup()```After updating our theme() options, we are ready to craft our first enhanced penguins visualization```{r}my_theme =theme_classic() +theme(axis.line.x=element_line(color="gray20", linewidth=1.0),axis.line.y=element_line(color="gray20", linewidth=0.5),panel.grid.major.x=element_blank(),panel.grid.minor=element_blank(), panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),panel.background =element_rect(colour ="gray50", linewidth=1.0),text=element_text(size=12),plot.title=element_text(hjust=0.5),plot.subtitle=element_text(hjust=0.5),legend.box.spacing=unit(0, "pt"),legend.margin=margin(5,0,0,0),legend.spacing.x =unit(0.025,"cm"),legend.box="vertical",legend.spacing.y =unit(0,"cm"),legend.position="bottom")ggsave("images/penguinsFig1.png",ggplot(data=penguins, aes(x=flipper_length_mm, y=body_mass_g)) +geom_jitter(aes(color=sex, shape=species), width=0.25, height=0, size=2, alpha=0.40) +geom_point(data=sgmeans, aes(color=sex, shape=species), size=4, show.legend=F) +scale_color_manual(name="Gender",values=c("#E7298A","#1B9E77")) +scale_shape_manual(name="Species",values=-c(127312,127314,127318)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Species and Gender",x ="Flipper length (mm)",y ="Body mass (g)",color ="Penguin species",shape ="Penguin species") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_theme,width=7.5, height=7.5/1.618, units="in")```## Other Unicode characters useful to map genderWhen mapping gender onto visualizations, we can use the standard Unicode characters `r intToUtf8(9792)` and `r intToUtf8(9794)` to represent **female** and **male**, respectively. Alternatively, for lovers of the chess game, we can also borrow the Unicode characters `r intToUtf8(9819)` and `r intToUtf8(9812)`. Moreover, when mapping the two values of a dichotomous factor we can use an empty Unicode character for one group, and the corresponding solid Unicode character for the other group (e.g., `r intToUtf8(9816)` = group 1; `r intToUtf8(9822)` = group 2).Let's get an enhanced scatter plot grouping by Island and Gender using standard gender symbols, with islands mapped to colors.```{r}penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=island, shape=sex), size=3) +scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Species",values=-c(9792,9794)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig2.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")```To add color contrast to improve our visualization, we can include one more ggplot layer to overlay an island-colored solid circle using the Unicode character `r intToUtf8(9679)````{r}penguinsPlot = penguinsPlot +geom_point(aes(color=island), shape=-9679, size=6.5, alpha=0.15)ggsave("images/penguinsFig3.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")```The code and output for chess lovers would look like```{r}penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=island, shape=sex), size=2.5) +scale_color_manual(name="Gender",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Species",values=-c(9819,9812)) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig4.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")```## Smart Unicode annotation for 3 factorsOn this final example I want to illustrate one way to map 3 factors to construct a neat visualization; this time, colors will be used to map penguin species. Next, we need to add the IslandGender column to the penguins data frame by running```{r}penguins = penguins %>%mutate(IslandGender =paste0(island, "_", sex))```Solid circled letters can be used to represent **female** penguins, while empty squared letters can be used to represent **male** penguins. The final penguins visualization example is displayed below.```{r}igShapes =c(127313,127281,127315,127283,127331,127299)penguinsPlot =ggplot(data=penguins, aes(y=flipper_length_mm, x=body_mass_g)) +geom_point(aes(color=species, shape=IslandGender), size=2) +scale_color_manual(name="Species",values=c("#1B9E77","#D95F02","#6A3D9A")) +scale_shape_manual(name="Island_Gender",values=-igShapes) +labs(title ="Penguin size, Palmer Station LTER",subtitle ="Flipper length and body mass for Island and Gender",y ="Flipper length (mm)",x ="Body mass (g)",color ="Island",shape ="Gender") +guides(colour=guide_legend(order=1), shape=guide_legend(order=2)) + my_themeggsave("images/penguinsFig5.png", penguinsPlot, width=7.5, height=7.5/1.618, units="in")```## Closing notesIf reading this post has inspired some creative ideas, then that only means my goal was met. I want to close this post sharing some of the technical details of the tools used to elaborate this document.The code to obtain the HTML version of this document was written using the RStudio 2023.03.0-386 IDE (Windows 10/11 Zip/Tarballs available for free at <https://posit.co/download/rstudio-desktop/>). I used the features of a next generation Rmarkdown tool called [Quarto](https://quarto.org/); before Quarto, I was always scared by the complex look of Rmarkdown code, but the Visual mode of RStudio was extremely useful for me to start using it more.In order to be able to produce output using the Render button of RStudio's IDE, I had to make a minor change on one of the internal auxiliary R scripts (called **execute.R**) stored in the extracted folder. The path to it is displayed below{width="450"}Using Window's WordPad, I edited line number 12```{r}#| eval: falseoldwd <-setwd(dirname(rmarkdown:::abs_path(input)))```To replace it with```{r}#| eval: falseoldwd <-setwd(getwd())```After saving this change, the Render button started working as expected.I encourage you to stay in touch and share your opinios.Enjoy!!!