Adding pillar support to a datatype

Kirill Müller, Hadley Wickham

2017-11-07

To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:

If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the pillar package that powers the printing of tibbles. Either add pillar to the Imports: section of your DESCRIPTION, or simply call:

devtools::use_package("pillar")

This short vignette assumes a package that implements an S3 class "latlon" and uses roxygen2 to create documentation and the NAMESPACE file. For this vignette to work we need to attach pillar:

library(pillar)

Prerequisites

We define a class "latlon" that encodes geographic coordinates in a complex number. For simplicity, the values are printed as hours and minutes only.

#' @export
latlon <- function(lat, lon) {
  as_latlon(complex(real = lon, imaginary = lat))
}

#' @export
as_latlon <- function(x) {
  structure(x, class = "latlon")
}

#' @export
c.latlon <- function(x, ...) {
  as_latlon(NextMethod())
}

#' @export
`[.latlon` <- function(x, i) {
  as_latlon(NextMethod())
}

#' @export
format.latlon <- function(x, ..., formatter = deg_rad) {
  x_valid <- which(!is.na(x))

  lat <- unclass(Im(x[x_valid]))
  lon <- unclass(Re(x[x_valid]))

  ret <- rep("<NA>", length(x))
  ret[x_valid] <- paste(
    formatter(lat, c("N", "S")),
    formatter(lon, c("E", "W"))
  )
  format(ret, justify = "right")
}

deg_rad <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- trunc(x)
  x <- x - deg
  rad <- round(x * 60)
  sprintf("%3d°%.2d'%s", deg, rad, pm[ifelse(sign >= 0, 1, 2)])
}

#' @export
print.latlon <- function(x, ...) {
  cat(format(x), sep = "\n")
  invisible(x)
}

latlon(32.7102978, -117.1704058)
##  32°43'N 117°10'W

More methods are needed to make this class fully compatible with data frames, see e.g. the hms package for a more complete example.

Using in a tibble

Columns on this class can be used in a tibble right away, but the output will be less than ideal:

library(tibble)
data <- tibble(
  venue = "rstudio::conf",
  year  = 2017:2019,
  loc   = latlon(
    c(28.3411783, 32.7102978, NA),
    c(-81.5480348, -117.1704058, NA)
  ),
  paths = list(
    loc[1],
    c(loc[1], loc[2]),
    loc[2]
  )
)

data
## # A tibble: 3 x 4
##   venue          year loc                      paths       
##   <chr>         <int> <S3: latlon>             <list>      
## 1 rstudio::conf  2017 -81.5480348+28.3411783i  <S3: latlon>
## 2 rstudio::conf  2018 -117.1704058+32.7102978i <S3: latlon>
## 3 rstudio::conf  2019 <NA>                     <S3: latlon>

(The paths column is a list that contains arbitrary data, in our case latlon vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)

The output has three main problems:

  1. The column type is displayed as <S3: latlon>. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.
  2. The cells in the paths column are also displayed as <S3: latlon>.
  3. The values in the loc column are formatted as complex numbers (the underlying storage), without using the format() method we have defined. This is by design.

In the remainder I’ll show how to fix these problems, and also how to implement rendering that adapts to the available width.

Fixing the data type

To display <geo> as data type, we need to override the type_sum() method. This method should return a length-1 character vector that can be used in a column header. For your own classes, strive for an evocative abbreviation that’s under 6 characters.

#' @export
type_sum.latlon <- function(x) {
  "geo"
}

Because the value shown there doesn’t depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see #53.)

data
## # A tibble: 3 x 4
##   venue          year loc                      paths 
##   <chr>         <int> <geo>                    <list>
## 1 rstudio::conf  2017 -81.5480348+28.3411783i  <geo> 
## 2 rstudio::conf  2018 -117.1704058+32.7102978i <geo> 
## 3 rstudio::conf  2019 <NA>                     <geo>

Rendering the value

To use our format method for rendering, we implement the pillar_shaft() method for our class. (A pillar is mainly a shaft (decorated with an ornament), with a capital above and a base below. Multiple pillars form a colonnade, which can be stacked in multiple tiers. This is the motivation behind the names in our API.)

#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  new_pillar_shaft(out, align = "right")
}

The simplest variant calls our format() method, everything else is handled by pillar, in particular by the new_pillar_shaft() helper. Note how the align argument affects the alignment of NA values and of the column name and type.

data
## # A tibble: 3 x 4
##   venue          year               loc paths 
##   <chr>         <int>             <geo> <list>
## 1 rstudio::conf  2017  28°20'N  81°33'W <geo> 
## 2 rstudio::conf  2018  32°43'N 117°10'W <geo> 
## 3 rstudio::conf  2019                NA <geo>

We could also use left alignment and indent only the NA values:

#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  new_pillar_shaft(out, align = "left", na_indent = 5)
}

data
## # A tibble: 3 x 4
##   venue          year loc               paths 
##   <chr>         <int> <geo>             <list>
## 1 rstudio::conf  2017  28°20'N  81°33'W <geo> 
## 2 rstudio::conf  2018  32°43'N 117°10'W <geo> 
## 3 rstudio::conf  2019      NA           <geo>

Adaptive rendering

If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn’t currently apply to our class, because we haven’t specified a minimum width for our values:

print(data, width = 35)
## # A tibble: 3 x 4
##   venue     year loc              
##   <chr>    <int> <geo>            
## 1 rstudio…  2017  28°20'N  81°33'W
## 2 rstudio…  2018  32°43'N 117°10'W
## 3 rstudio…  2019      NA          
## # ... with 1 more variable:
## #   paths <list>

If we specify a minimum width when constructing the shaft, the loc column will be truncated:

#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  new_pillar_shaft(out, align = "right", min_width = 10)
}

print(data, width = 35)
## # A tibble: 3 x 4
##   venue    year          loc paths
##   <chr>   <int>        <geo> <lis>
## 1 rstudi…  2017  28°20'N  8… <geo>
## 2 rstudi…  2018  32°43'N 11… <geo>
## 3 rstudi…  2019           NA <geo>

This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the pillar_shaft() method is required to achieve this:

#' @export
pillar_shaft.latlon <- function(x, ...) {
  deg <- format(x, formatter = deg)
  deg[is.na(x)] <- style_na("NA")
  deg_rad <- format(x)
  deg_rad[is.na(x)] <- style_na("NA")
  ret <- structure(
    list(deg = deg, deg_rad = deg_rad),
    class = c("pillar_shaft_latlon", "pillar_shaft")
  )
  ret <- set_width(ret, max(crayon::col_nchar(deg_rad), 0))
  ret <- set_min_width(ret, max(crayon::col_nchar(deg), 0))
  ret
}

Here, pillar_shaft() returns an object of the "pillar_shaft_latlon" class (which is also a "pillar_shaft") that contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of NA values explicitly. (crayon::col_nchar() is like nchar() but strips the formatting added by style_na().)

For completeness, the code that implements the degree-only formatting looks like this:

deg <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- round(x)
  sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
}

All that’s left to do is to implement a format() method for our new "pillar_shaft_latlon" class. This method will be called with a width argument, which then determines which of the formattings to choose:

#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
  if (all(crayon::col_nchar(x$deg_rad) <= width)) {
    ornament <- x$deg_rad
  } else {
    ornament <- x$deg
  }

  new_ornament(ornament)
}

data
## # A tibble: 3 x 4
##   venue          year loc               paths 
##   <chr>         <int> <geo>             <list>
## 1 rstudio::conf  2017  28°20'N  81°33'W <geo> 
## 2 rstudio::conf  2018  32°43'N 117°10'W <geo> 
## 3 rstudio::conf  2019 NA                <geo>
print(data, width = 35)
## # A tibble: 3 x 4
##   venue    year loc          paths
##   <chr>   <int> <geo>        <lis>
## 1 rstudi…  2017  28°N 82°W   <geo>
## 2 rstudi…  2018 33°N 117°W   <geo>
## 3 rstudi…  2019 NA           <geo>

Adding color

Both new_pillar_shaft() and new_ornament() accept escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. style_subtle() displays text in a light gray. For default data types, this style is used for insignificant digits. We’ll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the crayon package to add custom formattings to your text.

#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x, formatter = deg_rad_color)
  out[is.na(x)] <- NA
  new_pillar_shaft(out, align = "left", na_indent = 5)
}

deg_rad_color <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- trunc(x)
  x <- x - deg
  rad <- round(x * 60)
  ret <- sprintf(
    "%d%s%.2d%s%s",
    deg,
    style_subtle("°"),
    rad,
    style_subtle("'"),
    pm[ifelse(sign >= 0, 1, 2)]
  )
  ret[is.na(x)] <- ""
  format(ret, justify = "right")
}

data
## # A tibble: 3 x 4
##   venue          year loc              paths 
##   <chr>         <int> <geo>            <list>
## 1 rstudio::conf  2017 28°20'N  81°33'W <geo> 
## 2 rstudio::conf  2018 32°43'N 117°10'W <geo> 
## 3 rstudio::conf  2019      NA          <geo>

Currently, ANSI escapes are not rendered in vignettes, so the display here isn’t much different from earlier examples. This may change in the future.

Fixing list columns

To tweak the output in the paths column, we simply need to indicate that our class is an S3 vector:

#' @export
is_vector_s3.latlon <- function(x) TRUE

data
## # A tibble: 3 x 4
##   venue          year loc              paths    
##   <chr>         <int> <geo>            <list>   
## 1 rstudio::conf  2017 28°20'N  81°33'W <geo [1]>
## 2 rstudio::conf  2018 32°43'N 117°10'W <geo [2]>
## 3 rstudio::conf  2019      NA          <geo [1]>

This is picked up by the default implementation of obj_sum(), which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an obj_sum() method for your class if your object is vectorised and built on top of a list.

An example of an object of this type in base R is POSIXlt: it is a list with 9 components.

x <- as.POSIXlt(Sys.time() + c(0, 60, 3600)) 
str(unclass(x))
## List of 11
##  $ sec   : num [1:3] 4.51 4.51 4.51
##  $ min   : int [1:3] 36 37 36
##  $ hour  : int [1:3] 14 14 15
##  $ mday  : int [1:3] 7 7 7
##  $ mon   : int [1:3] 10 10 10
##  $ year  : int [1:3] 117 117 117
##  $ wday  : int [1:3] 2 2 2
##  $ yday  : int [1:3] 310 310 310
##  $ isdst : int [1:3] 0 0 0
##  $ zone  : chr [1:3] "CET" "CET" "CET"
##  $ gmtoff: int [1:3] 3600 3600 3600
##  - attr(*, "tzone")= chr [1:3] "" "CET" "CEST"

But it pretends to be a vector with 3 elements:

x
## [1] "2017-11-07 14:36:04 CET" "2017-11-07 14:37:04 CET"
## [3] "2017-11-07 15:36:04 CET"
length(x)
## [1] 3
str(x)
##  POSIXlt[1:3], format: "2017-11-07 14:36:04" "2017-11-07 14:37:04" ...

So we need to define a method that returns a character vector the same length as x:

#' @export
obj_sum.POSIXlt <- function(x) {
  rep("POSIXlt", length(x))
}