The grammar of graphics

The components of ggplot2’s grammar of graphics are

one or more datasets,

one or more geometric objects that serve as the visual representations of the data, – for instance, points, lines, rectangles, contours,

descriptions of how the variables in the data are mapped to visual properties (aesthetics) of the geometric objects, and an associated scale (e. g., linear, logarithmic, rank),

one or more coordinate systems,

statistical summarization rules,

a facet specification, i.e. the use of multiple similar subplots to look at subsets of the same data,

optional parameters that affect the layout and rendering, such text size, font and alignment, legend positions.

In the examples above, Figures 3.7 and 3.8, the dataset was groupsize, the variables were the numeric values as well as the names of groupsize, which we mapped to the aesthetics -axis and -axis respectively, the scale was linear on the and rank-based on the -axis (the bars are ordered alphanumerically and each has the same width), and the geometric object was the rectangular bar.

Items 4–7 in the above list are optional. If you don’t specify them, then the Cartesian is used as the coordinate system, the statistical summary is the trivial one (i.e., the identity), and no facets or subplots are made (we’ll see examples later on, in Section 3.8). The first three items are required: a valid ggplot2 “sentence” needs to contain at least one of each of them.

In fact, ggplot2’s implementation of the grammar of graphics allows you to use the same type of component multiple times, in what are called layers (Wickham 2010). For example, the code below uses three types of geometric objects in the same plot, for the same data: points, a line and a confidence band.

'dftx <- data.frame(t(Biobase::exprs(x)), pData(x))
library(ggplot2)
ggplot(dftx, aes(x = X1426642_at, y = X1418765_at)) +
  geom_point(shape = 1) +
  geom_smooth(method = "loess")'

## [1] "dftx <- data.frame(t(Biobase::exprs(x)), pData(x))\nlibrary(ggplot2)\nggplot(dftx, aes(x = X1426642_at, y = X1418765_at)) +\n  geom_point(shape = 1) +\n  geom_smooth(method = \"loess\")"

Here we had to assemble a copy of the expression data (Biobase::exprs(x)) and the sample annotation data (pData(x)) all together into the dataframe dftx – since this is the data format that ggplot2 functions most easily take as input (more on this in Section 13.10).

We can further enhance the plot by using colors – since each of the points in Figure 3.9 corresponds to one sample, it makes sense to use the sampleColour information in the object x.

'ggplot(dftx, aes(x = X1426642_at, y = X1418765_at))  +
  geom_point(aes(color = sampleGroup), shape = 19) +
  scale_color_manual(values = groupColor, guide = "none") +
  geom_smooth(method = "loess")'

## [1] "ggplot(dftx, aes(x = X1426642_at, y = X1418765_at))  +\n  geom_point(aes(color = sampleGroup), shape = 19) +\n  scale_color_manual(values = groupColor, guide = \"none\") +\n  geom_smooth(method = \"loess\")"

As an aside, if we want to find out which genes are targeted by these probe identifiers, and what they might do, we can call:

library("mouse4302.db")

## Loading required package: AnnotationDbi

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Loading required package: IRanges

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:utils':
## 
##     findMatches

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: org.Mm.eg.db

##

##

AnnotationDbi::select(mouse4302.db,
   keys = c("1426642_at", "1418765_at"), keytype = "PROBEID",
   columns = c("SYMBOL", "GENENAME"))

## 'select()' returned 1:1 mapping between keys and columns

##      PROBEID SYMBOL                                            GENENAME
## 1 1426642_at    Fn1                                       fibronectin 1
## 2 1418765_at  Timd2 T cell immunoglobulin and mucin domain containing 2

Often when using ggplot you will only need to specify the data, aesthetics and a geometric object. Most geometric objects implicitly call a suitable default statistical summary of the data. For example, if you are using geom_smooth, then ggplot2 uses stat = “smooth” by default and displays a line; if you use geom_histogram, the data are binned and the result is displayed in barplot format. Here’s an example:

'dfx = as.data.frame(Biobase::exprs(x))
ggplot(dfx, aes(x = `20 E3.25`)) + geom_histogram(binwidth = 0.2)'

## [1] "dfx = as.data.frame(Biobase::exprs(x))\nggplot(dfx, aes(x = `20 E3.25`)) + geom_histogram(binwidth = 0.2)"

'pb = ggplot(groups, aes(x = sampleGroup, y = n))'

## [1] "pb = ggplot(groups, aes(x = sampleGroup, y = n))"

This creates a plot object pb. If we try to display it, it creates an empty plot, because we haven’t specified what geometric object we want to use. All that we have in our pb object so far are the data and the aesthetics (Figure 3.12).

'class(pb)'

## [1] "class(pb)"

'pb'

## [1] "pb"

'pb = pb + geom_bar(stat = "identity")
pb = pb + aes(fill = sampleGroup)
pb = pb + theme(axis.text.x = element_text(angle = 90, hjust = 1))
pb = pb + scale_fill_manual(values = groupColor, name = "Groups")
pb'

## [1] "pb = pb + geom_bar(stat = \"identity\")\npb = pb + aes(fill = sampleGroup)\npb = pb + theme(axis.text.x = element_text(angle = 90, hjust = 1))\npb = pb + scale_fill_manual(values = groupColor, name = \"Groups\")\npb"

This step-wise buildup –taking a graphics object already produced in some way and then further refining it– can be more convenient and easy to manage than, say, providing all the instructions upfront to the single function call that creates the graphic.

We can quickly try out different visualization ideas without having to rebuild our plots each time from scratch, but rather store the partially finished object and then modify it in different ways. For example we can switch our plot to polar coordinates to create an alternative visualization of the barplot.

'pb.polar = pb + coord_polar() +
  theme(axis.text.x = element_text(angle = 0, hjust = 1),
        axis.text.y = element_blank(),
        axis.ticks = element_blank()) +
  xlab("") + ylab("")
pb.polar'

## [1] "pb.polar = pb + coord_polar() +\n  theme(axis.text.x = element_text(angle = 0, hjust = 1),\n        axis.text.y = element_blank(),\n        axis.ticks = element_blank()) +\n  xlab(\"\") + ylab(\"\")\npb.polar"

Refrensi :

https://www.huber.embl.de/msmb/03-chap.html#the-grammar-of-graphics

The grammar of graphics

Syukur Halim, NIM :220605220014, Dosen Pembimbing Prof. Dr. Suhartono, M.Kom., Magister Informatika UIN Maulana Malik Ibrahim Malang, Indonesia

2023-11-07

The grammar of graphics