Grammar of Graphics: ggplot2 package

alper yilmaz

2023-05-04

You can view this presentation at https://rpubs.com/alperyilmaz/ggplot-slides

Always plot your data!

ggplot package

  • Extremely powerful and flexible
  • Consistent (grammar of graphics)
  • Very powerful user base and active development
  • At the beginning it’s hard, but then it pays off

Lots of documentation and tutorials

Cheatsheet

Data should be tidy

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

Packages and data

We’ll be using Gapminder data

library(tidyverse)
library(gapminder)
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows

Aim

Do you think we can draw similar plot at Gapminder site?

Layers of ggplot

Here’s visual guide to different layers in ggplot.

First plot

gapminder %>% 
  ggplot()

Second plot (data layer)

gapminder %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp))

Aesthetics (aes) map data variables (age, distance) to graphic elements (axes)

Individual geoms

gapminder %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point()

Questions

  • What do you think geom_bar() and geom_col() will produce?
  • If we want to plot lines with geom_lines() do we need to group data by year or country?

Piping processed data to ggplot

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point()

Note

OMG! all tidyverse verbs, pipes can be combined with ggplot

Warning

Use |> or %>% for combining verbs, but you should use + for ggplot layers

Geom properties

Size, color, shape, width, transparency (alpha) of geoms can be either:

  • set to constant value
  • or, mapped to variable in our data

Change color of all points

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point(color="red")

Map color to variable

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point(aes(color=continent))

Color: to map or not to map

Not mapping

geom_point(colour = "red")
# colour is given a concrete value ('red')

vs. mapping

geom_point(aes(colour = continent))
# colour maps a *variable* (using `aes`)

Mapping more features

Let’s map color to continent and point size to population

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(aes(size=pop))

Mapped values can be scaled (Scales layer)

Colors of discrete values are controlled by scale_color_discrete, scale_color_manual (and much more color palette packages)

Colors of continuous data is controlled by scale_color_continuous, scale_color_gradient (and much more color palette packages)

continent is discrete and lifeExp is continuous.

Let’s map color to continent and then modify the colors.

Discrete mapping scale

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point() +
  scale_color_manual(values = c("orange", "blue","red","green","black"))

Continuous mapping scale

Now, let’s map color to lifeExp and then modify the colors.

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=lifeExp)) +
  geom_point() +
  scale_color_gradient(low="blue",high = "red")

Scaling of X or Y coordinates

General rule of thumb, if the range is large and large numbers skew results, use log scale. You can either use mutate and generate new column or you can use scaling functions. (Please type scale_ and then press Tab). Below, scale_x_log10() was used for scaling gdpPercap data.

Multiple geom_point layers

ggplot is very powerful and flexible. You can combine different geoms and generate unique plots. Let’s color and emphasize countries which have population greater than 100 million.

We’ll generate another data frame keeping those countries and then plot as a separate geom_point layer.

populated <- gapminder |> filter(year=="2007") |> filter(pop > 100000000)
gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point() +
  geom_point(data=populated, color="red", size=3)

Faceting layer

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point() +
  facet_wrap(~continent)  # separate plots for each continent

Questions

  • The default (and correct) option for x and y is “not free”. What happens if we have “free_x”, “free_y” and both?
  • How can we have life expectancy starting from 0 (not minimum value)?

Statistical summary as layer

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~continent) 

Theme as layer

You can customize almost all aspects of your plot with theme() function. There are theme_* functions which can modify the appearance of the plot by altering several aspects at the same time.

Let’s apply theme_classic() to our plot (result in next slide).

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(aes(size=pop)) +
  scale_x_log10() +
  theme_classic()

Theme packages

There are many theme packages which can be used to apply themes of popular resources. Let’s apply The Economist theme to your plot with single line of code

library(ggthemes)

gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(aes(size=pop)) +
  scale_x_log10() +
  theme_economist()     # requires ggthemes package

Titles

labs() function can be used to add title, subtitle and change X and Y axis labels. Below, we added the following layer

labs(x="GDP Per Capita (USD, Log10)", y="Life Expectancy (years)", 
       title="Wealth vs. Life Expectancy", 
       subtitle="Source: Gapminder (range: 1952-2007)")

Interactive HTML output

There are many packages which can embed html compatable plots/tables to your output. The code below generates an interactive plot by plotly package. See the next slide for results.

You can hover, zoom in and out, select regions.

library(plotly)

gap_plot <- gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(aes(size=pop)) +
  scale_x_log10() +
  theme_minimal()

ggplotly(gap_plot)

ggplot figures are objects

You can assign plots to an object and then re-use them. You can add a layer to an existing plot. Let’s make a plot and try different themes with it.

baseplot <- gapminder %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(aes(size=pop)) +
  scale_x_log10()

ggplot figures are objects

baseplot + theme_dark()

baseplot + theme_linedraw()

Plots as objects also help compositing figures from independent figures. Please check patchwork and cowplot packages.

Saving figures

ggsave() function is used to save the plots you generated. It can save plots in PNG, PDF, SVG formats. The resolution and size of the image can be adjusted for high quality/resolution images.

newplot <- baseplot + theme_classic()  
  
ggsave(newplot, filename="gpd-health.png", dpi = 200, width = 6, height = 6)

Galleries

Please check the comprehensive list of ggplot related resources at awesome-ggplot2 repo at Github.

R Graph Gallery has nicely categorized possible plots which might give an idea about what is available.

Ggplot is much more powerful than you can ever imagine. Please browse the galleries by the following users. For some users you might need to browse the individual folders.

ggbio

ggenomes

ggtrees with ggtreextra

ggmsa