gggeom

Motivation

The goal of gggeom is to provide a compact way to represent geometric objects and useful tools to wrok with them. This package is (or soon will be) used to power ggvis: if you want to create a new layer function, you’ll need to be somewhat familiar with this package. But gggeom is low-level and is not tightly tied to ggvis so you could use it to implement a new graphics system if you wanted.

Compared to ggplot2, gggeom is somewhat similar to the geoms, but gggeom has a much purer take on geometric primitives. For example, ggplot2 has the geom_histogram() which is really a combination of a statistical transformation (stat_bin()) and a bar (geom_bar()). gggeom avoids this muddle, sticking purely to geometric objects. It also provides many more tools for manipulating geometries, indepedent of the particular plot they will eventually generate.

All gggeom manipulations should be able to proces s ~100,000 geometries in less than 0.1. More geometries than that is unlikely to produce a useful plot. If you do have very large datasets, they should be summarised (using e.g. the tools ggstat) before being visualised.

Geometric primitives

There are only three fundamental geometric primitives needed to draw any graphic:

points: \((x, y)\)
text: \((x, y)\)
paths/polygons: \({(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)}\)

However, because these primitives are so general, it is hard to define useful operations on them. So gggeoms provide a number of additional geometric objects that restrict the properties of points, paths and polygons in useful ways:

arcs: \(([x, y, [r_1, r_2], [\theta_1, \theta_2]])\).
lines: a path where the x values are increasing \(x_1 \le x_2\).
steps: a line drawn with only horizontal and vertical segments.
segments: a single line segment parameterised by \(x_1\), \(x_2\), \(y_1\), \(y_2\).
rects (and images): \(([x_1, x_2], [y_1, y_2])\).
ribbons: an ordered sequence of intervals: \({(x_{1}, [y_{11}, y_{12}]), ..., (x_n, [y_{n1}, y_{n2}])}\), where \(x_i < x_{i+1}\)

Geometries are described in turns of their position. When rendered a geometric object will need other properties (like stroke, fill, stroke width, …) but gggeom concerns itself only computations that involve position.

A geometry is represented as a data frame, where each row corresponds to a single object. You turn a data frame into a geometry using the appropriate render function:

scatter <- iris %>% render_point(~Sepal.Length, ~Sepal.Width) %>% head()
scatter
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  x_  y_
#> 1          5.1         3.5          1.4         0.2  setosa 5.1 3.5
#> 2          4.9         3.0          1.4         0.2  setosa 4.9 3.0
#> 3          4.7         3.2          1.3         0.2  setosa 4.7 3.2
#> 4          4.6         3.1          1.5         0.2  setosa 4.6 3.1
#> 5          5.0         3.6          1.4         0.2  setosa 5.0 3.6
#> 6          5.4         3.9          1.7         0.4  setosa 5.4 3.9
#> Geometry: geom_point

The default behaviour of the render function preserve all existing columms so that they can be later mapped to other properties of the geometry. However, this is mostly incidental to gggeom - it only works with the position columns (which all end in _ to avoid clashes with other vars).

All geometries inherit from “geom” and “data.frame”. Additional inheritance structure is based no on the appearance of the geom but on the data need to display the geom. This means that:

Polygons, lines and steps inherit from paths.
Text inherits from points.

All geometries have a base graphics plot() method. These are useful for examples, explanation and debugging, but not serious data visualisation. ... is passed on to the underlying base graphic method, so if you’re familiar with the graphic parameters, you can tweak the appearance.

plot(scatter)

plot of chunk unnamed-chunk-3

Paths (and polygons, lines and steps)

If each row represents a single object, how are paths, polygons, lines and steps represented? We take advantage of a relatively esoteric R feature - data frame columns can be lists. For example, take a look at the built-in nz data set:

head(nz)
#>           x_         y_         island
#> 1 <dbl[714]> <dbl[714]>          North
#> 2 <dbl[642]> <dbl[642]>          South
#> 3  <dbl[54]>  <dbl[54]>        Stewart
#> 4  <dbl[18]>  <dbl[18]>  Great.Barrier
#> 5  <dbl[16]>  <dbl[16]>     Resolution
#> 6   <dbl[5]>   <dbl[5]> Little.Barrier
#> Geometry: geom_polygon
plot(nz)

plot of chunk unnamed-chunk-4

The x_ and y_ variables are lists of numeric vectors:

nz$x_[[5]]
#>  [1] 166 167 167 167 167 167 167 167 167 167 167 166 166 166 166 166
nz$y_[[5]]
#>  [1] -45.9 -45.8 -45.9 -45.9 -45.9 -45.9 -45.9 -45.8 -45.8 -45.8 -45.8
#> [12] -45.8 -45.9 -45.9 -45.9 -45.9

As well as a plot() method, paths also have a points() method which makes it easier to see exactly where the data lie:

class(nz)
#> [1] "geom_polygon" "geom_path"    "geom"         "data.frame"
nz %>% subset(island == "Stewart") %>% plot() %>% points()

plot of chunk unnamed-chunk-6

(Not that plot() invisibly returns the input data to make this sort of chaining easy.)

Converting to primitives

You can convert any geometry to its equivalent primitive path by using geom_pointificate(). For example, imagine we have some rects:

df <- data.frame(x = c(1:3, 3), y = c(1:3, 2))
rects <- render_tile(df, ~x, ~y, width = 0.95, height = 0.95)

rects
#>   x y   x1_  x2_   y1_  y2_
#> 1 1 1 0.525 1.48 0.525 1.48
#> 2 2 2 1.525 2.48 1.525 2.48
#> 3 3 3 2.525 3.48 2.525 3.48
#> 4 3 2 2.525 3.48 1.525 2.48
#> Geometry: geom_rect
plot(rects)

plot of chunk unnamed-chunk-7

We can convert these to four point polygons with geometry_pointificate():

rects %>% geometry_pointificate()
#>   x y       x_       y_
#> 1 1 1 <dbl[4]> <dbl[4]>
#> 2 2 2 <dbl[4]> <dbl[4]>
#> 3 3 3 <dbl[4]> <dbl[4]>
#> 4 3 2 <dbl[4]> <dbl[4]>
#> Geometry: geom_polygon
rects %>% geometry_pointificate() %>% plot() %>% points()

plot of chunk unnamed-chunk-8

The rendering looks similar at first glance, but by using the points() command we can see that each rectangle is composed of four points.

The main advantage to converting to polygons is that there are a number of transformations that make sense for polygons, but not for rects, because the resulting transformation would not still be a rect:

polys <- rects %>% geometry_pointificate(close = TRUE)
# Rotate each polygon 5 degrees clockwise
polys %>% geometry_rotate(5) %>% plot() %>% points()

plot of chunk unnamed-chunk-9

# Transform into polar coordinates
polys %>% geometry_warp("polar", tolerance = 0.0001) %>% plot() %>% points()

plot of chunk unnamed-chunk-9


# Need to figure out why this looks so bad - must be warp bug :(

(In other words the set of rects is not closed under many useful transformations.)

There are also operations that make sense for rects, but not for general polygons. For example, it makes sense to stack rects so that their lower edge falls on the x-axis, and the stack up from there. There’s no useful way to stack arbitrary polygons:

rects %>% geometry_stack() %>% plot()

plot of chunk unnamed-chunk-10

Geometric transformations

The following table lists all transformations implemented in gggeom in the rows, and the geometries to which they apply in the columns:

#>               
#>                arc line path point polygon rect ribbon segment text
#>   dodge                                    *                       
#>   flip         *   *    *    *     *       *    *      *       *   
#>   jitter       *   *    *    *     *       *    *      *       *   
#>   pointificate *        *    *     *       *    *              *   
#>   reflect      *   *    *    *     *       *    *      *       *   
#>   scale        *                           *    *                  
#>   simplify              *                                          
#>   stack        *   *    *    *     *       *    *      *       *   
#>   transform    *   *    *    *             *    *              *   
#>   warp                  *