Version 1.1.1 - March 2024

License

Introduction

What is ggplot2?

Data Visualization: ggplot2 is a powerful data visualization package in R that allows you to create elegant and informative plots. It uses a grammar of graphics approach, which means that you can build up a plot in layers.

Aesthetic Mapping: The first step in creating a plot with ggplot2 is to specify the data and aesthetic mappings. This is done using the ggplot() function, where you can map variables to aesthetics such as x and y positions, color, size, and shape.

What is ggplot2?

Geometric Objects (Geoms): After specifying the data and aesthetics, you can add geometric objects (geoms) to the plot. Geoms are the types of plot elements, such as points, lines, bars, and boxes. For example, geom_point() adds a scatter plot, geom_line() adds a line plot, and geom_bar() adds a bar plot.

Statistical Transformations (Stats): ggplot2 also provides statistical transformations (stats) that can be added to the plot. Stats calculate new variables based on the data and aesthetics. For example, stat_smooth() adds a smoothed line to the plot, and stat_bin() creates a histogram.

Graphics Grammar

Plot is composed of:

  • data the information to be visualized (data frame)
  • mapping of data onto aesthetic attributes
    • layer
      • geometric elements (geom)
      • statistical transformations (stat)
    • scale: maps data to attributes (e.g., color, size ..)
    • coord system: maps data coordinates to the plane
    • facet: breaks up the plot as small multiples
    • theme: provide support elements and controls details

Basic elements

Any ggplot2 plot has three key components:

  • the data
  • aesthetic mappings
    • maps data variables to aesthetics features
    • coordinates or attributes
  • visual layer (at least one)
    • define the visual object
    • maps aesthetics features to geometric properties

Basic elements

Here is the data used for this exercise:

series <- data.frame(
  i = 1:10,
  linear = 1:10,
  fibonacci = c(1,1,2,3,5,8,13,21,34,55),
  square = (1:10)^2,
  log = log(1:10)
)

and the packages uses are

ggplot2

tidyverse

Basic elements

ggplot(series, aes(x=i,y=fibonacci))+geom_point()

Basic elements

ggplot(series, aes(x=i,y=fibonacci))+geom_point()
  • series : defines the data to be used
  • aes(x=i,y=fibonacci) : maps data to visual characteristics
    • i and fibonacci to the x and y coordinates respectively
    • cartesian coordinates are implied by default
    • linear scales implied
  • geom_point() : defines a layer that maps data to points
    • shape, color, size of points are implied by default

Mappings

  • Scale depends on the type of aesthetics
    • for position (x, y) is by default a simple linear scale
    • for other types of aesthetics it may vary

Scales and coordinates

Both scale and coordinates have (implicit) defaults:

  • the default scale depends on
    • the specific aestethics
    • the type of the variable
  • the default coordinate system is coord_cartesian()
    • another option is coord_polar()

Default scale adapts to variable

ggplot(series, aes(x=factor(i),y=fibonacci))+geom_point()

A factor is mapped to equidistanced slots along the axis

Different coordinate system

ggplot(series, aes(x=i,y=linear))+geom_point()+
      coord_polar()

x maps to \(\theta\) (with max(x) \(\rightarrow 2\pi\)) and y maps to \(\rho\) (distance from center)

Different y axis scale

ggplot(series, aes(x=i,y=square))+geom_point()+
      scale_y_log10(minor_breaks=c(1:10,1:10*10))

Applied a log scale to the position y

Additional aesthetics

Aesthetics include:

  • position (x, y)
  • grouping (group)
  • other:
    • color : line or simbol color
    • fill : area fill color
    • shape : type of shape
    • size : size of the object

Additional aesthetics

ggplot(series, 
         aes(x=i, y=fibonacci, color=fibonacci))+ geom_point()

A gradient scale is used for a continuous (numeric) variable

Additional features

ggplot(series%>%mutate( mag = fibonacci %/% 10), 
       aes(x=i, y=fibonacci, color=factor(mag)))+ geom_point()

Discrete color scale is used for a factor variable

Scales

For each aesthetics type a few scale functions are provided:

  • scale_x_.., scale_y_..
  • scale_color_..
  • scale_fill_..
  • scale_shape_..
  • scale_size_..

Additional feature and scale

ggplot(series%>%mutate( mag = fibonacci %/% 10), 
       aes(x=i, y=fibonacci, color=mag))+ 
       scale_color_gradient(low="blue",high="gold")+ 
       geom_point()

Geometry layers

Geometry function add new layers

  • geom_point() : draw points
  • geom_col() : draw a bar/column
  • geom_line() : draw lines connecting positions
  • geom_text() and geom_label() : write a text or label
  • geom_area() : draw a filled area

Layers are drawn in order of declaration, with the latest on top.

The order of all other statements is irrelevant.

Geometry col

ggplot(series, aes(x=i, y=fibonacci))+ 
       geom_col()

Geometry line

ggplot(series, aes(x=i, y=fibonacci))+ 
       geom_line()

Using multiple layers

ggplot(series, aes(x=i, y=fibonacci, label=fibonacci))+ 
       geom_line() + geom_label()

Geometries with statistical transformation

A few geometries perform a transformation befor mapping to an object

  • geom_bar() : compute frequencies of discrete variables
  • geom_histogram() : compute frequencies of bins of continuous vars
  • geom_boxplot() : compute boxplot
  • geom_violin(): compute a violin plot

Computing frequencies

data7 <- data.frame(category = c("A", "B", "C", "D"),
                    frequency = c(25, 20, 15, 30))
ggplot(data7, aes(x = category, y = frequency)) +
  geom_bar(stat = "identity")

Histogram geometry

data6 = data.frame(age=c(8, 8, 12, 9, 11, 10, 12, 10, 9, 12, 11, 9, 10, 10, 11, 12, 10, 11, 12, 8, 9, 11, 10, 11, 11, 11, 9, 10, 11, 11, 10, 9, 10, 11, 10, 12, 10, 12, 10, 9, 10, 12, 11, 10, 9, 11, 11, 10, 9, 10))
ggplot(data6, aes(x = age)) + geom_histogram(binwidth = 1, 
      fill = "skyblue", color = "black") +
  labs(title = "Histogram of Values",x = "Age",y = "Frequency") +
  theme_minimal()

Boxplot geometry

ggplot(series, aes(x=fibonacci))+ 
      geom_boxplot()

Theme

The support elements and default visual features are defined by a theme

  • theme_classic() : similar to base functions
  • theme_gray() : the default theme (gray background)
  • theme_bw() : same as default but with white backgound
  • theme_light() : same as bw but with lighter lines
  • theme_dark() : dark gray background
  • theme_minimal() : minimalistic theme
  • theme_void() : no supporting elements

Changing the theme

ggplot(series, aes(x=factor(i),y=fibonacci))+geom_point()+
    theme_minimal()

The default theme can be changed with theme_set().

References