Contents:


Why use custom themes?

ggplot2, a popular R package for visualization (used here interchangeably with ggplot), produces graphics with a recognizable default “style”. While the “baked-in” ggplot theme settings can work well from a design standpoint, there is also massive flexibility and potential for customization within the theme() function of ggplot. Almost any non-data component of a plot - axes, titles subtitles, gridlines, and more - can be specified within this function.

This customization can help you establish a personal or professional brand for your plots. Many organizations leverage ggplot themes in their publications to create a consistent style across graphics that they produce. The BBC Visual and Data Journalism team is one example; they worked with their design team to code an R package that lets anyone create graphics in their pre-determined style. Even within organizations that have dedicated marketing and branding teams, making an effort to match data science output to a company brand can help your results stand out to organization stakeholders.

Branding isn’t just limited to organizations - it can also be a personal tool. Styling all your plots consistently can build continuity across presentations, blogs, or other personal products. Customizing ggplot outputs in this way also demonstrates a deeper familiarity with the tool and shows a greater attention to design detail.


Example: custom theme for the Georgetown Public Policy Review

Background on the Georgetown Public Policy Review

This year I’ve been leading the data visualization team at the Georgetown Public Policy Review (GPPR), a student-run publication at the McCourt School of Public Policy. When I took over the team, one problem was immediately clear: while we did have a style guide (link to the style guide as of October 31, 2019), that guide was applied loosely if at all, and really only within specific tools. This meant that our branding was inconsistent across outputs, and made our plots look less professional.

Historically, the GPPR data viz team has used primarily Tableau, because it’s relatively easy for beginners to pick up in comparison to other (potentially more flexible) tools like ggplot in R. When I joined the team last year, most of my data viz experience was in R, but the senior editors discouraged me from using it because it was so much more time-consuming to customize a visualization in R than in Tableau. I was told that doing so might have even required me to learn Adobe Illustrator, which was not only not a free tool (rendering moot one of my favorite advantages of an open-source software like R), but also had a relatively steep learning curve compared to Tableau.

Using Tableau worked well for me, and I’m grateful that I got to learn another visualization tool. Still, when I took over the data visualization team this year it became apparent that not all of the new team members wanted to stick to it. As I learned more about ggplot, I also realized that it was much more flexible than I had initially thought, and that much of GPPR’s style guide customization could really be included within ggplot’s own theme settings and wouldn’t require external software. Taking the time to create a custom theme could take a lot of the startup costs out of ggplot, particularly for team members who already had a background in R, and expand the team’s toolkit.

Example Plot

Let’s start by recreating the plot in the following image:

I originally created this plot in Tableau for an article about civil servants in international organizations. It’s apparent here that I wasn’t fully following the style guide when I created this plot: I’m using Tableau’s default fonts instead of Georgia, for example, and my source line should be in the bottom right rather than centered with the logo. I also think the “UN” labels on the barplots are pretty extraneous. That said, while this plot isn’t exactly groundbreaking, it should provide a good foundation for customizing a theme.

This data was originally sourced from 2017 UN Chief Executive Human Resources Statistics; I adapted it slightly for the purposes of this tutorial (ignoring some variables, etc). Let’s load tidyverse packages (which include ggplot2, the main focus of the tutorial) and read in the data.

#Necessary Packages
library(tidyverse)

#Specify column types ahead of time
col_types <- cols(
  Org = col_character(),
  LenServ = col_factor(),
  Sum = col_number()
)

#Reading in the data
UN <- read_csv('UN_Data_HR.csv', col_types = col_types)

The dataset isn’t the main focus of this tutorial, but here is a quick description of the variables:

Variable Explanation
Org The United Nations (UN) branch in question; could be the UN, WHO, UNICEF, UNHCR, or UNDP
LenServ Length of service for a given employee; this is a categorical variable that could encompass “Less than 5 years”, “5-10 years”, “10 to 15 years”, “15 to 20 years”, “20 to 25 years”, and “25 or more years”
Sum Refers to the sum of employees in each category

Let’s make the initial barplot, graphing the number of UN employees by their length of service:

#Select UN variable 
un_only <- filter(UN, Org == 'UN')

#Recreating barplot with default theme

ggplot(
  #specify data to use 
  data = un_only, aes(x = LenServ, y = Sum)) +
  
  #specify barplot
  geom_bar(stat = 'identity') +
  
  #specify text labels
  labs(x = "Length of Service", 
       y = "Total Employees", 
       title = "Number of United Nations employees by length of service",
       caption = "Source: UN Chief Executive Board \nHuman Resource Statistics, \n2017 | Madeline Pickens")