To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:
type_sum() determines what goes into the column header.pillar_shaft() determines what goes into the body of the column.is_vector_s3() and obj_sum() are used when rendering list columns.If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the pillar package that powers the printing of tibbles. Either add pillar to the Imports: section of your DESCRIPTION, or simply call:
devtools::use_package("pillar")This short vignette assumes a package that implements an S3 class "latlon" and uses roxygen2 to create documentation and the NAMESPACE file. For this vignette to work we need to attach pillar:
library(pillar)We define a class "latlon" that encodes geographic coordinates in a complex number. For simplicity, the values are printed as hours and minutes only.
#' @export
latlon <- function(lat, lon) {
as_latlon(complex(real = lon, imaginary = lat))
}
#' @export
as_latlon <- function(x) {
structure(x, class = "latlon")
}
#' @export
c.latlon <- function(x, ...) {
as_latlon(NextMethod())
}
#' @export
`[.latlon` <- function(x, i) {
as_latlon(NextMethod())
}
#' @export
format.latlon <- function(x, ..., formatter = deg_rad) {
x_valid <- which(!is.na(x))
lat <- unclass(Im(x[x_valid]))
lon <- unclass(Re(x[x_valid]))
ret <- rep("<NA>", length(x))
ret[x_valid] <- paste(
formatter(lat, c("N", "S")),
formatter(lon, c("E", "W"))
)
format(ret, justify = "right")
}
deg_rad <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
rad <- round(x * 60)
sprintf("%3d°%.2d'%s", deg, rad, pm[ifelse(sign >= 0, 1, 2)])
}
#' @export
print.latlon <- function(x, ...) {
cat(format(x), sep = "\n")
invisible(x)
}
latlon(32.7102978, -117.1704058)## 32°43'N 117°10'W
More methods are needed to make this class fully compatible with data frames, see e.g. the hms package for a more complete example.
Columns on this class can be used in a tibble right away, but the output will be less than ideal:
library(tibble)
data <- tibble(
venue = "rstudio::conf",
year = 2017:2019,
loc = latlon(
c(28.3411783, 32.7102978, NA),
c(-81.5480348, -117.1704058, NA)
),
paths = list(
loc[1],
c(loc[1], loc[2]),
loc[2]
)
)
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <S3: latlon> <list>
## 1 rstudio::conf 2017 -81.5480348+28.3411783i <S3: latlon>
## 2 rstudio::conf 2018 -117.1704058+32.7102978i <S3: latlon>
## 3 rstudio::conf 2019 <NA> <S3: latlon>
(The paths column is a list that contains arbitrary data, in our case latlon vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)
The output has three main problems:
<S3: latlon>. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.paths column are also displayed as <S3: latlon>.loc column are formatted as complex numbers (the underlying storage), without using the format() method we have defined. This is by design.In the remainder I’ll show how to fix these problems, and also how to implement rendering that adapts to the available width.
To display <geo> as data type, we need to override the type_sum() method. This method should return a length-1 character vector that can be used in a column header. For your own classes, strive for an evocative abbreviation that’s under 6 characters.
#' @export
type_sum.latlon <- function(x) {
"geo"
}Because the value shown there doesn’t depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see #53.)
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 -81.5480348+28.3411783i <geo>
## 2 rstudio::conf 2018 -117.1704058+32.7102978i <geo>
## 3 rstudio::conf 2019 <NA> <geo>
To use our format method for rendering, we implement the pillar_shaft() method for our class. (A pillar is mainly a shaft (decorated with an ornament), with a capital above and a base below. Multiple pillars form a colonnade, which can be stacked in multiple tiers. This is the motivation behind the names in our API.)
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
new_pillar_shaft(out, align = "right")
}The simplest variant calls our format() method, everything else is handled by pillar, in particular by the new_pillar_shaft() helper. Note how the align argument affects the alignment of NA values and of the column name and type.
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
We could also use left alignment and indent only the NA values:
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
new_pillar_shaft(out, align = "left", na_indent = 5)
}
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn’t currently apply to our class, because we haven’t specified a minimum width for our values:
print(data, width = 35)## # A tibble: 3 x 4
## venue year loc
## <chr> <int> <geo>
## 1 rstudio… 2017 28°20'N 81°33'W
## 2 rstudio… 2018 32°43'N 117°10'W
## 3 rstudio… 2019 NA
## # ... with 1 more variable:
## # paths <list>
If we specify a minimum width when constructing the shaft, the loc column will be truncated:
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
new_pillar_shaft(out, align = "right", min_width = 10)
}
print(data, width = 35)## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <lis>
## 1 rstudi… 2017 28°20'N 8… <geo>
## 2 rstudi… 2018 32°43'N 11… <geo>
## 3 rstudi… 2019 NA <geo>
This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the pillar_shaft() method is required to achieve this:
#' @export
pillar_shaft.latlon <- function(x, ...) {
deg <- format(x, formatter = deg)
deg[is.na(x)] <- style_na("NA")
deg_rad <- format(x)
deg_rad[is.na(x)] <- style_na("NA")
ret <- structure(
list(deg = deg, deg_rad = deg_rad),
class = c("pillar_shaft_latlon", "pillar_shaft")
)
ret <- set_width(ret, max(crayon::col_nchar(deg_rad), 0))
ret <- set_min_width(ret, max(crayon::col_nchar(deg), 0))
ret
}Here, pillar_shaft() returns an object of the "pillar_shaft_latlon" class (which is also a "pillar_shaft") that contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of NA values explicitly. (crayon::col_nchar() is like nchar() but strips the formatting added by style_na().)
For completeness, the code that implements the degree-only formatting looks like this:
deg <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- round(x)
sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
}All that’s left to do is to implement a format() method for our new "pillar_shaft_latlon" class. This method will be called with a width argument, which then determines which of the formattings to choose:
#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
if (all(crayon::col_nchar(x$deg_rad) <= width)) {
ornament <- x$deg_rad
} else {
ornament <- x$deg
}
new_ornament(ornament)
}
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
print(data, width = 35)## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <lis>
## 1 rstudi… 2017 28°N 82°W <geo>
## 2 rstudi… 2018 33°N 117°W <geo>
## 3 rstudi… 2019 NA <geo>
Both new_pillar_shaft() and new_ornament() accept escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. style_subtle() displays text in a light gray. For default data types, this style is used for insignificant digits. We’ll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the crayon package to add custom formattings to your text.
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x, formatter = deg_rad_color)
out[is.na(x)] <- NA
new_pillar_shaft(out, align = "left", na_indent = 5)
}
deg_rad_color <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
rad <- round(x * 60)
ret <- sprintf(
"%d%s%.2d%s%s",
deg,
style_subtle("°"),
rad,
style_subtle("'"),
pm[ifelse(sign >= 0, 1, 2)]
)
ret[is.na(x)] <- ""
format(ret, justify = "right")
}
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
Currently, ANSI escapes are not rendered in vignettes, so the display here isn’t much different from earlier examples. This may change in the future.
To tweak the output in the paths column, we simply need to indicate that our class is an S3 vector:
#' @export
is_vector_s3.latlon <- function(x) TRUE
data## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo [1]>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo [2]>
## 3 rstudio::conf 2019 NA <geo [1]>
This is picked up by the default implementation of obj_sum(), which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an obj_sum() method for your class if your object is vectorised and built on top of a list.
An example of an object of this type in base R is POSIXlt: it is a list with 9 components.
x <- as.POSIXlt(Sys.time() + c(0, 60, 3600))
str(unclass(x))## List of 11
## $ sec : num [1:3] 4.51 4.51 4.51
## $ min : int [1:3] 36 37 36
## $ hour : int [1:3] 14 14 15
## $ mday : int [1:3] 7 7 7
## $ mon : int [1:3] 10 10 10
## $ year : int [1:3] 117 117 117
## $ wday : int [1:3] 2 2 2
## $ yday : int [1:3] 310 310 310
## $ isdst : int [1:3] 0 0 0
## $ zone : chr [1:3] "CET" "CET" "CET"
## $ gmtoff: int [1:3] 3600 3600 3600
## - attr(*, "tzone")= chr [1:3] "" "CET" "CEST"
But it pretends to be a vector with 3 elements:
x## [1] "2017-11-07 14:36:04 CET" "2017-11-07 14:37:04 CET"
## [3] "2017-11-07 15:36:04 CET"
length(x)## [1] 3
str(x)## POSIXlt[1:3], format: "2017-11-07 14:36:04" "2017-11-07 14:37:04" ...
So we need to define a method that returns a character vector the same length as x:
#' @export
obj_sum.POSIXlt <- function(x) {
rep("POSIXlt", length(x))
}