lkdjiinr

lkdjiinr
ggplot2
Adding layers with geoms
Adding data with %+%
Families of geoms based on variable type

Checking out a first package on GitHub, since I should try to put one together myself.

Congratulations to @lkdjiinr. \o/

lkdjiinr

# library('devtools')
# install_github('lkdjiin/lkdjiinr')
library('lkdjiinr')

The name for a concatenation function clashes with the modify function in ggplot2:

"foo" %+% "bar"

## [1] "foobar"

c("a", "a") %+% c("b", "c")

## [1] "ab" "ac"

`%+%`

## function (...) 
## paste(..., sep = "")
## <environment: namespace:lkdjiinr>

Perhaps revive double-pipes instead!

`||` <- function (...) {
  paste0(...)
}

c("a", "b") || c(1, 2)

## [1] "a1" "b2"

Or not.

remove(`||`)

ggplot2

library('ggplot2')

## 
## Attaching package: 'ggplot2'
## 
## The following object is masked from 'package:lkdjiinr':
## 
##     %+%

library('scales')

Better detach lkdjiinr:

detach("package:lkdjiinr", unload=TRUE)

What’s %+% in ggplot2?

`%+%`

## function (e1, e2) 
## {
##     e2name <- deparse(substitute(e2))
##     if (is.theme(e1)) 
##         add_theme(e1, e2, e2name)
##     else if (is.ggplot(e1)) 
##         add_ggplot(e1, e2, e2name)
## }
## <environment: namespace:ggplot2>

Help doesn’t really sell it, gg-add, and it’s only mentioned briefly in one of the later chapters in the book.

It seems more obscure than it should be, because it allows ggplot objects to be reused and can be used in conjunction with empty ggplots using data.frame as a place-holder.

Three steps I use to define ggplots:

Create ggplot with data.frame and mapping aesthetics to variables.
Add geoms applicable to those variable type(s).
Add data with %+% and define any facets last.

# an empty ggplot 
# with aesthetics mapped to variables
g <- ggplot(data.frame()
             , aes(x = wt, y = mpg))

# add geom applicable to aesthetic variable types
g_point <- g + geom_point()

# add data with %+%, see ?`%+%`, add facets
g_point %+% mtcars + 
  facet_wrap(~am)

The empty ggplot object doesn’t consume memory, it’s re-usable with geoms. Likewise, combined objects with defined geoms are re-usable too, without being bloated with data that consumes memory. Adding data and conditioning last means the data is only ever run through the ggplot object when required to produce a plot.

Separating data from the ggplot object helps to clarify ‘grammar’ that relates aesthetic mapping of variable in aes(...) to appropriate geom layers for the variable types (numeric, factor, etc.), and to conditioning with facets (i.e. subsetting data).

Re-use g with a different geom:

g_smooth <- g + geom_smooth(method = 'loess')

g_smooth %+% mtcars + 
  facet_wrap(~am)

Adding layers with geoms

ls()

## [1] "g"        "g_point"  "g_smooth"

The empty ggplot, g, has variables mapped to aesthetics x and y, but no data or geom layers:

# only mapping and labels defined for x and y:
str(g)

## List of 9
##  $ data       :'data.frame': 0 obs. of  0 variables
##  $ layers     : list()
##  $ scales     :Reference class 'Scales' [package "ggplot2"] with 1 field
##   ..$ scales: NULL
##   ..and 21 methods, of which 9 are  possibly relevant:
##   ..  add, clone, find, get_scales, has_scale, initialize, input, n,
##   ..  non_position_scales
##  $ mapping    :List of 2
##   ..$ x: symbol wt
##   ..$ y: symbol mpg
##  $ theme      : list()
##  $ coordinates:List of 1
##   ..$ limits:List of 2
##   .. ..$ x: NULL
##   .. ..$ y: NULL
##   ..- attr(*, "class")= chr [1:2] "cartesian" "coord"
##  $ facet      :List of 1
##   ..$ shrink: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "null" "facet"
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 2
##   ..$ x: chr "wt"
##   ..$ y: chr "mpg"
##  - attr(*, "class")= chr [1:2] "gg" "ggplot"

Adding a geom adds a layer:

str(g_point)

## List of 9
##  $ data       :'data.frame': 0 obs. of  0 variables
##  $ layers     :List of 1
##   ..$ :Classes 'proto', 'environment' <environment: 0x000000000ab50f70> 
##  $ scales     :Reference class 'Scales' [package "ggplot2"] with 1 field
##   ..$ scales: list()
##   ..and 21 methods, of which 9 are  possibly relevant:
##   ..  add, clone, find, get_scales, has_scale, initialize, input, n,
##   ..  non_position_scales
##  $ mapping    :List of 2
##   ..$ x: symbol wt
##   ..$ y: symbol mpg
##  $ theme      : list()
##  $ coordinates:List of 1
##   ..$ limits:List of 2
##   .. ..$ x: NULL
##   .. ..$ y: NULL
##   ..- attr(*, "class")= chr [1:2] "cartesian" "coord"
##  $ facet      :List of 1
##   ..$ shrink: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "null" "facet"
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 2
##   ..$ x: chr "wt"
##   ..$ y: chr "mpg"
##  - attr(*, "class")= chr [1:2] "gg" "ggplot"

What’s in a layer?

g_point$layers

## [[1]]
## geom_point: na.rm = FALSE 
## stat_identity:  
## position_identity: (width = NULL, height = NULL)

g_smooth$layers

## [[1]]
## geom_smooth:  
## stat_smooth: method = loess 
## position_identity: (width = NULL, height = NULL)

Adding data with `%+%`

Adding data relies on an internal function, add_ggplot:

getAnywhere(`%+%`)

## A single object matching '%+%' was found
## It was found in the following places
##   package:ggplot2
##   namespace:ggplot2
## with value
## 
## function (e1, e2) 
## {
##     e2name <- deparse(substitute(e2))
##     if (is.theme(e1)) 
##         add_theme(e1, e2, e2name)
##     else if (is.ggplot(e1)) 
##         add_ggplot(e1, e2, e2name)
## }
## <environment: namespace:ggplot2>

getAnywhere(add_ggplot)

## A single object matching 'add_ggplot' was found
## It was found in the following places
##   namespace:ggplot2
## with value
## 
## function (p, object, objectname) 
## {
##     if (is.null(object)) 
##         return(p)
##     p <- plot_clone(p)
##     if (is.data.frame(object)) {
##         p$data <- object
##     }
##     else if (is.theme(object)) {
##         p$theme <- update_theme(p$theme, object)
##     }
##     else if (inherits(object, "scale")) {
##         p$scales$add(object)
##     }
##     else if (inherits(object, "labels")) {
##         p <- update_labels(p, object)
##     }
##     else if (inherits(object, "guides")) {
##         p <- update_guides(p, object)
##     }
##     else if (inherits(object, "uneval")) {
##         p$mapping <- defaults(object, p$mapping)
##         labels <- lapply(object, deparse)
##         names(labels) <- names(object)
##         p <- update_labels(p, labels)
##     }
##     else if (is.coord(object)) {
##         p$coordinates <- object
##         p
##     }
##     else if (is.facet(object)) {
##         p$facet <- object
##         p
##     }
##     else if (is.list(object)) {
##         for (o in object) {
##             p <- p + o
##         }
##     }
##     else if (is.proto(object)) {
##         p <- switch(object$class(), layer = {
##             p$layers <- append(p$layers, object)
##             mapping <- make_labels(object$mapping)
##             default <- make_labels(object$stat$default_aes())
##             new_labels <- defaults(mapping, default)
##             p$labels <- defaults(p$labels, new_labels)
##             p
##         }, coord = {
##             p$coordinates <- object
##             p
##         })
##     }
##     else {
##         stop("Don't know how to add ", objectname, " to a plot", 
##             call. = FALSE)
##     }
##     set_last_plot(p)
##     p
## }
## <environment: namespace:ggplot2>

If p is a ggplot object and the object added is a data.frame, that data frame is assigned to p$data:

##     p <- plot_clone(p)
##     if (is.data.frame(object)) {
##         p$data <- object
##     }

So it could also be used to empty a ggplot of data, substituting an empty data.frame():

# with data and one geom layer:
g_data <- g_point %+% mtcars 

g_data + 
  facet_wrap(~am)

# without data, but adding a facet
g_empty <- g_data %+% data.frame() + 
  facet_wrap(~am)

# note layer and facet are defined
str(g_empty)

## List of 9
##  $ data       :'data.frame': 0 obs. of  0 variables
##  $ layers     :List of 1
##   ..$ :Classes 'proto', 'environment' <environment: 0x000000000ac8df88> 
##  $ scales     :Reference class 'Scales' [package "ggplot2"] with 1 field
##   ..$ scales: list()
##   ..and 21 methods, of which 9 are  possibly relevant:
##   ..  add, clone, find, get_scales, has_scale, initialize, input, n,
##   ..  non_position_scales
##  $ mapping    :List of 2
##   ..$ x: symbol wt
##   ..$ y: symbol mpg
##  $ theme      : list()
##  $ coordinates:List of 1
##   ..$ limits:List of 2
##   .. ..$ x: NULL
##   .. ..$ y: NULL
##   ..- attr(*, "class")= chr [1:2] "cartesian" "coord"
##  $ facet      :List of 7
##   ..$ facets  :List of 1
##   .. ..$ am: symbol am
##   .. ..- attr(*, "env")=<environment: 0x000000000b164338> 
##   .. ..- attr(*, "class")= chr "quoted"
##   ..$ free    :List of 2
##   .. ..$ x: logi FALSE
##   .. ..$ y: logi FALSE
##   ..$ as.table: logi TRUE
##   ..$ drop    : logi TRUE
##   ..$ ncol    : NULL
##   ..$ nrow    : NULL
##   ..$ shrink  : logi TRUE
##   ..- attr(*, "class")= chr [1:2] "wrap" "facet"
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 2
##   ..$ x: chr "wt"
##   ..$ y: chr "mpg"
##  - attr(*, "class")= chr [1:2] "gg" "ggplot"

# add data again and another geom layer
g_empty %+% mtcars +
  geom_smooth(method = 'loess')

I suppose that helps develop plots with data, perhaps a small subset, before removing that data using %+% data.frame() to adding other datasets to an empty ggplot objects.

Families of geoms based on variable type

Defining the minimal aesthetic mapping in original ggplot object makes it more adaptable, e.g. if x is numeric a family of geoms are applicable (geom_histogram, geom_density, stat_ecdf).

g_mpg <- ggplot(data.frame(), aes(x = mpg)) +
  labs(x = 'miles per gallon')

g_hist <- g_mpg + geom_histogram(binwidth = 2)
g_dens <- g_mpg + geom_density()
g_ecdf <- g_mpg + stat_ecdf()

g_hist %+% mtcars + 
  facet_wrap(~am)

g_dens %+% mtcars + 
  scale_y_continuous(labels = percent) + 
  facet_wrap(~am)

g_ecdf %+% mtcars +
  facet_wrap(~am) +
  labs(y = 'ecdf')

If x is a factor (or character) the most obvious geom applicable is geom_bar, but varying position, from the default stack to dodge and fill:

g_cyl <- ggplot(data.frame(), aes(x = factor(cyl), fill = factor(gear))) +
  labs(x = 'cylinders', fill = 'gears')

g_bars <- g_cyl + geom_bar(alpha = 2/3)
g_bard <- g_cyl + geom_bar(position = 'dodge', alpha = 2/3)
g_barf <- g_cyl + geom_bar(position = 'fill', alpha = 2/3)

g_bars %+% mtcars + facet_wrap(~am)

g_bard %+% mtcars + facet_wrap(~am)

g_barf %+% mtcars + 
  scale_y_continuous(labels = percent) + 
  facet_wrap(~am) +
  labs(y = NULL)

lkdjiinr

M.Devlin

July 2015

lkdjiinr

ggplot2

Adding layers with geoms

Adding data with `%+%`

Families of geoms based on variable type

lkdjiinr

M.Devlin

July 2015

lkdjiinr

ggplot2

Adding layers with geoms

Adding data with %+%

Families of geoms based on variable type

Adding data with `%+%`