There is no common form for spatial data that covers the complexity of geometric and topological types widely used in R. Traditional, GIS-alike spatial data is exceptional in the way that it does not make use of normal forms for geometric representations. We are working on a framework for complex, hierarchical data that can incorporate standard spatial types and allows working with them in a general and flexible way. We provide tools in R for more general representations of spatial primitives and the intermediate forms required for translation and analytical tasks.

The simple features standard has the following limitations:

shapes are represented as paths so only planar polygonal shapes are possible
the standard allows for XY[Z[M]] geometry,but this is not extensible - there is no capacity to store data against component geometry elements
there is no capacity for internal topology (no vertex-, edge-, or path-sharing).

These limitations mean that it cannot represent in-full every-day objects from GPS, rgl, ggplot2/ggvis, spatstat, maps, TopoJSON, CAD drawings, or from 3D or general model structures. Translations between geo-spatial forms and the graphics and data grammars can be disjointed and sometimes awkward, relying on localized implementations that are lossy or inefficient, require 3rd party workflows, or involve unnecessary tasks.

Simple features is rightly seen as a corner-stone resource, but as a central basis for translations it is only able to handle a subset of the wider remit of “spatial data” in R. Topology in the form of component-element sharing (indexing of vertex, edge, arc, path) is not available to simple features, and while there are tools to generate it for certain planar cases, these are not explicitly available outside provided workflows and this information is not generally available for extensible uses.

Translation patterns that use simple features to translate data from native formats result in loss of information requiring complicated workarounds to preserve it. Common translations include in-memory structural representations, serialized forms within file formats, coordinate system geometry transformations, and topological or shape-modifying transformations.

Illustration

This code generates a simple data set with the following properties:

a data set of 2 POLYGONs
the first polygon has one hole and one “concavity” (a three-vertex section of the path that differentiates the island from its convex hull)
the second polygon is a single, convex island, and shares one edge with the first polygon

## two POLYGONs, composed of three rings p1+p2 and p3
p1 <- cbind(x = c(0, 0, 0.75, 1,   0.5, 0.8, 0.69, 0), 
            y = c(0, 1, 1,    0.8, 0.7, 0.6, 0,    0))
p2 <- cbind(x = c(0.2, 0.2, 0.3, 0.5, 0.5, 0.2), 
            y = c(0.2, 0.4, 0.6, 0.4, 0.2, 0.2))
p3 <- cbind(x = c(0.69, 0.8, 1.1, 1.23, 0.69), 
            y = c(0, 0.6, 0.63, 0.3, 0))
library(sf)
g <- data.frame(two_polygons = 1:2)
g[["geometry"]] <- st_sfc(st_polygon(list(p1, p2[nrow(p2):1, ])), st_polygon(list(p3)))
g <- st_as_sf(g)

## Delaunay triangulation of the polygon vertices, grouped by feature
gt <- g
st_geometry(gt)  <- st_triangulate(st_geometry(g))

op <- par(mfrow = c(1, 2))
plot(g)
plot(gt, col = scales::alpha(c("dodgerblue", "firebrick"), 0.3), main = "convex triangulation")
plot(st_geometry(g), add = TRUE, lwd = 4)

par(op)

There is no capacity in the simple features standard to describe this situation in topological terms. The representation of these objects is always fully expanded, all instances of the vertices are stored explicitly with no record of duplications that close the rings within a polygon , or that represent shared coordinates between the two polygons. The relationship between the polygon island and its hole is unambiguous, but there is no identity of the hole or what it represents. There is no capacity to distinguish it from other parts of the polygon, and this is true of ever more complex objects with multiple islands, multiple holes, multiple connected lines and line segments.

Similarly in a MULTIPOINT there is no identity or grouping to each component vertex within the whole, and a single vertex has no information stored with it other than its position in an array and its raw geometric properties.

In the second panel is a Delaunay triangulation of the coordinates, grouped by feature. The triangles do not align with the polygon edges, because the triangle edges were not included in the triangulation algorithm. There is also an inserted overlap between the polygons, which makes this data set invalid for many standard GIS constructs. This is not a criticism of st_triangulate though, because even when we perform a constrained triangulation (Cheng et al., 2012), the simple features standard dictates that we store these all in TRIANGLE form, each a ring of four explicit coordinates within a GEOMETRYCOLLECTION. This can store the non-convex form of the triangulation, but only in fully expanded form - i.e. as a mesh with no indexing (http://postgis.net/docs/ST_GeometryN.html).

We can detect these properties using the in-built tools and return the exact relationship in object form. These tools internally build topological structure and then discard it.

## yes the first touches the second, and vice versa
sf::st_touches(g)

## [[1]]
## [1] 2
## 
## [[2]]
## [1] 1

## each only covers itself, not the other
sf::st_covered_by(g)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2

sf::st_crosses(g)

## [[1]]
## integer(0)
## 
## [[2]]
## integer(0)

sf::st_intersects(g)

## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] 1 2

i <- sf::st_intersection(g[1, ], g[2, ])

## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

plot(st_geometry(g))
plot(st_geometry(i), add = TRUE, lwd = 3, col = "dodgerblue")

op <- par(xpd = NA)
plot(st_geometry(st_cast(i, "MULTIPOINT")), add = TRUE, col = "firebrick",cex = 4)

par(op)

We got what we wanted, but how can we put this information back into the original object?

Alternative representations

A path can be treated as a first-class type and and stored as such within a relational model, along with the other entities objects (“features”) and vertices. with this approach we gain two advantages, we can normalize the relations (detect and remove redundancy) and also store any additional data about the entitities in the model.

(There is an interplay between “able to store extra information” and “able to normalize”, since extra data may introduce further redundancy, but we defer this issue for now since full normalization is not our primary task).

An alternative form of representing this information is in the sc package, providing two main schemes, the PATH and the PRIMITIVE models. These provide different ways of linking objects and vertices. The two models are not mutually exclusive, and can co-exist sensibly within one data set. The PATH model can always be derived from the PRIMITIVE model and vice versa but the PRIMITIVE model has extra capacities that PATH cannot provide.

The key feature in sc is a relational model of indexed primitives and component elements. This provides a bridge to the traditionally structural, or array/matrix indexing and storage used in computer graphics and gaming.

The PATH models consists of three tables to store the objects (“features”), paths, and vertices, plus a fourth table path_link_vertex that allows de-duplication of the vertices (here in X-Y geometry).

#devtools::install_github("mdsumner/sc")
#devtools::install_github("mdsumner/scsf")
library(sc)
library(scsf)
## build the PATH model from a simple features object
str(sc::PATH(g))

## List of 4
##  $ object          :Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
##   ..$ two_polygons: int [1:2] 1 2
##   ..$ object_     : chr [1:2] "4f9caa06" "319f3072"
##   ..- attr(*, "sf_column")= chr "geometry"
##   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
##   .. ..- attr(*, "names")= chr "two_polygons"
##  $ path            :Classes 'tbl_df', 'tbl' and 'data.frame':    3 obs. of  3 variables:
##   ..$ ncoords_: int [1:3] 8 6 5
##   ..$ path_   : chr [1:3] "d225162a" "431ca458" "f05e7573"
##   ..$ object_ : chr [1:3] "4f9caa06" "4f9caa06" "319f3072"
##  $ vertex          :Classes 'tbl_df', 'tbl' and 'data.frame':    19 obs. of  3 variables:
##   ..$ x_     : num [1:19] 0 0 0.75 1 0.5 0.8 0.69 0 0.2 0.5 ...
##   ..$ y_     : num [1:19] 0 1 1 0.8 0.7 0.6 0 0 0.2 0.2 ...
##   ..$ vertex_: chr [1:19] "a2821e22" "a2821e22" "a2821e22" "a2821e22" ...
##  $ path_link_vertex:Classes 'tbl_df', 'tbl' and 'data.frame':    3 obs. of  2 variables:
##   ..$ path_  : chr [1:3] "d225162a" "431ca458" "f05e7573"
##   ..$ vertex_: chr [1:3] "a2821e22" "b893ca0d" "92b3fbff"
##  - attr(*, "class")= chr [1:2] "PATH" "sc"
##  - attr(*, "join_ramp")= chr [1:4] "object" "path" "path_link_vertex" "vertex"

The PRIMITIVE model is derived from and incorporates the PATH model, but adds 1-dimensional primitives (segment) to the collection of tables.

set.seed(39)
names(prim <- sc::PRIMITIVE(g))

## [1] "object"           "path"             "vertex"          
## [4] "path_link_vertex" "segment"

str(prim[c("segment")])

## List of 1
##  $ segment:Classes 'tbl_df', 'tbl' and 'data.frame': 16 obs. of  4 variables:
##   ..$ .vertex0: chr [1:16] "1c324154" "1c324154" "1c324154" "1c324154" ...
##   ..$ .vertex1: chr [1:16] "1c324154" "1c324154" "1c324154" "1c324154" ...
##   ..$ path_   : chr [1:16] "3ed7380e" "3ed7380e" "3ed7380e" "3ed7380e" ...
##   ..$ segment_: chr [1:16] "9ef7f4d8" "6438e89d" "e67be168" "60c3f8f0" ...

These representations provide a relational form stored in tables. This kind of data is usually stored as structural arrays, where the indexing is inherently related to the physical size of the vertex array (this is what an rgl object is). The segment table adds links to the first and second vertex of every line segment, and also records which path it belongs to. This is not the final design, since a segment can belong to multiple paths (albeit with a different orientation), and it is not clear how much one of these forms requires all of the path and primitive information together.

We can extract the topological information directly, because it’s actually stored explicitly within the data structures already.

library(dplyr)

get_shared_edge <- function(x, ...) UseMethod("get_shared_edge")
get_shared_edge.sf <- function(x, ...) {
  prim <- sc::PRIMITIVE(x)
  shared <- prim$segment %>% dplyr::select(.vertex0, .vertex1) %>% 
    mutate_all(funs("I" = as.integer(factor(.))))  %>% 
    mutate(edge_id = paste(pmin(.vertex0_I, .vertex1_I), pmax(.vertex0_I, .vertex1_I), sep = "_")) %>% 
    group_by(edge_id) %>% filter(n() > 1) %>% ungroup() %>% select(.vertex0, .vertex1, edge_id)
  
  ## use this information to generate sf output
  g0 <- shared %>% inner_join(prim$vertex, c(".vertex0" = "vertex_")) %>% split(.$edge_id) %>% 
    purrr::map(function(x) st_linestring(as.matrix(x %>% dplyr::select(x_, y_)))) %>% st_sfc()
  st_sf(a = seq_along(g0), geometry = g0)
}

shared_e <- get_shared_edge(g)

plot(st_geometry(g))
plot(st_geometry(shared_e), add = TRUE, lwd = 3, col = "dodgerblue")

op <- par(xpd = NA)
plot(st_geometry(st_cast(shared_e, "MULTIPOINT")), add = TRUE, col = "firebrick",cex = 4)

par(op)

Apply this idea to a more complex data set.

nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)


edge1 <- get_shared_edge(nc)
plot(st_geometry(nc), col = viridisLite::viridis(nrow(nc)))

plot(st_geometry(edge1), col = viridisLite::inferno(nrow(edge1)), lwd = 2)

The edge1 data set is now an intermediate structure on the way to being an arc-node topology representation of all the shared arcs in this polygon layer. The implementation of this will avoid creating the simple features output however, as that is not able to natively store the index arc-paths.

Work ahead

Research the best ways forward

Investigate options for front-end user interfaces and back-end systems, and seek advice from key experts.

Implementations that we have considered include:

lists of tables, as illustrated in proto-forms in spbabel, rangl, rbgm
sf-like forms with nested list-columns of indexes, shared-entity semantics

and these raise the key issue that one data frame cannot store topological data where geometric elements are shared between features. Lists of tables are fine, and work well as a development idiom, but require some advanced design to find the right user-experience. We have erred on the side of unique indexes that allow arbitrary splitting and recombination of entities, but this is obviously expensive and requires work to only apply unique indexing when required, and otherwise use structural or matrix idioms for efficiencies.

Other options might include

advanced programming techniques, using environments, R6, with vertex/primitives pools
database or database-like connections in list-columns

Define language and classifications for the forms

The following entities underlie spatial objects: vertices, coordinates (instances of vertices), line segments, paths (linestrings, polygon rings, arcs), parts (rings, holes, linestrings, points), and objects or features.

We need relational models and the following broad components to a grammar of spatial data:

a language of paths-belong-features, (a POINT can be a degenerate PATH for a MULTIPOINT) this incorporates TopoJSON, the internal maps store
a language of vertices-belong-edges-belong-parts-belong-features, all sf is expressible with 1-D primitives, and this scheme incorporates GPS and other tracking-sensor data
a language of vertices-belong-triangles-belong-parts-belong-features, which incorporates rgl, CAD drawings, 3D and general models.

Key outputs

Provide tools for decomposing geo-spatial and other complex data to common general forms, including topological indexing.
Illustrate general workflow with tools to convert between sf, GeoJSON, TopoJSON, leaflet list-forms, and rgl and plotly
Generate a classification of the broad class of “spatial data” in R that incorporates simple features and other forms and guides translation efforts across R packages. These are patterns that are, for the most part, user-accessible, so creating modified or specialized versions that are more efficient or better focussed for particular tasks will be straightforward.
Implement a prototype general-form geo-spatial-graphics data structure that can store geometry, topology, aesthetic mappings to bridge the creation of hierarchical data in the tidyverse with its visualization and analysis.

Supporting documentation

The sc package is at an early stage, but a predecessor rangl takes these ideas further in order to work with the 2-D primitives. One of the problems with rangl is a reliance on the RTriangle package, and future work with BOOST or CGAL could remove the need to rely on the restrictive license used by RTriangle.

The discussion around these prospects is here: https://github.com/r-spatial/discuss/issues/6

The rangl package can build the full mesh of triangles, and convert it to a form understood by rgl. These representations provide a great amount of flexiblity for model representation and analysis, with the inherent relationships between primitives stored explicitly.

## devtools::install_github("r-gris/rangl")
library(rangl)
g_rangl <- rangl(as(g, "Spatial"))

str(g_rangl)

## List of 5
##  $ o   :Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
##   ..$ two_polygons: int [1:2] 1 2
##   ..$ object_     : chr [1:2] "AmgeuVrCQz" "Ppjuq8XKZq"
##  $ v   :Classes 'tbl_df', 'tbl' and 'data.frame':    14 obs. of  3 variables:
##   ..$ x_     : num [1:14] 0.2 0.69 0.5 0 0.2 0 0.3 0.5 0.8 0.5 ...
##   ..$ y_     : num [1:14] 0.2 0 0.2 0 0.4 1 0.6 0.7 0.6 0.4 ...
##   ..$ vertex_: chr [1:14] "um12iv4OX7" "L3QXpbcpMY" "R6f7UP8MhT" "fc0eXPo0Uu" ...
##  $ t   :Classes 'tbl_df', 'tbl' and 'data.frame':    14 obs. of  2 variables:
##   ..$ triangle_: chr [1:14] "xqHgkZaTLI" "bVMZNuhJ8u" "obD5QaRCn5" "gu0Hj71oiJ" ...
##   ..$ object_  : chr [1:14] "AmgeuVrCQz" "AmgeuVrCQz" "AmgeuVrCQz" "AmgeuVrCQz" ...
##  $ tXv :Classes 'tbl_df', 'tbl' and 'data.frame':    42 obs. of  2 variables:
##   ..$ triangle_: chr [1:42] "xqHgkZaTLI" "xqHgkZaTLI" "xqHgkZaTLI" "bVMZNuhJ8u" ...
##   ..$ vertex_  : chr [1:42] "um12iv4OX7" "L3QXpbcpMY" "R6f7UP8MhT" "L3QXpbcpMY" ...
##  $ meta:Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  4 variables:
##   ..$ proj : chr NA
##   ..$ x    : chr "x_"
##   ..$ y    : chr "y_"
##   ..$ ctime: chr "2017-02-26 11:19:52"
##  - attr(*, "class")= chr "trimesh"

## it's fairly trivial to convert from relational form
## to the structures used by rgl
rgl_form <- plot(g_rangl)

## Joining, by = "object_"

## Joining, by = "triangle_"

rgl::rglwidget(elementId="rangl")

A 3D visualization is not that useful in itself, but is here simply to prove that we can use a more general scheme to store simple features data as topological data, maintaining the higher levels of grouping (features, parts) but also being able to store more data against the underlying entities, since they are all stored in tables.

The rgl form of the data is two arrays, one of the actual vertices and the other of the primitives, an array of -3 indexes into the vertices.

str(rgl_form)

## List of 6
##  $ vb           : num [1:4, 1:14] 0.2 0.2 0 1 0.69 0 0 1 0.5 0.2 ...
##  $ it           : int [1:3, 1:14] 1 2 3 2 1 4 4 1 5 6 ...
##  $ primitivetype: chr "triangle"
##  $ material     : list()
##  $ normals      : NULL
##  $ texcoords    : NULL
##  - attr(*, "class")= chr [1:2] "mesh3d" "shape3d"

https://rpubs.com/cyclemumner/spatial-normal-forms

References

Cheng, Siu-Wing, Tamal K. Dey, and Jonathan Shewchuk. Delaunay mesh generation. CRC Press, 2012. https://www.crcpress.com/Delaunay-Mesh-Generation/Cheng-Dey-Shewchuk/p/book/9781584887300

The PostGIS Development Group, PostGIS 2.3.2 Manual, SVN Revision (15301) accessed 2017-01-30 http://postgis.net/docs/ST_GeometryN.html

Document preparation

devtools::session_info()

## Session info -------------------------------------------------------------

##  setting  value                                      
##  version  R version 3.3.2 Patched (2017-02-05 r72121)
##  system   x86_64, mingw32                            
##  ui       RTerm                                      
##  language (EN)                                       
##  collate  English_Australia.1252                     
##  tz       Australia/Hobart                           
##  date     2017-02-26

## Packages -----------------------------------------------------------------

##  package     * version     date       source                            
##  assertthat    0.1         2013-12-06 CRAN (R 3.3.2)                    
##  backports     1.0.5       2017-01-18 CRAN (R 3.3.2)                    
##  colorspace    1.3-2       2016-12-14 CRAN (R 3.3.2)                    
##  DBI           0.5-16      2017-02-26 Github (rstats-db/DBI@66c53b9)    
##  devtools      1.12.0.9000 2017-02-04 Github (hadley/devtools@d8ab190)  
##  digest        0.6.12      2017-01-27 CRAN (R 3.3.2)                    
##  dplyr       * 0.5.0       2016-06-24 CRAN (R 3.3.1)                    
##  evaluate      0.10        2016-10-11 CRAN (R 3.3.2)                    
##  ggplot2       2.2.1.9000  2017-02-15 local                             
##  gridExtra     2.2.1       2016-02-29 CRAN (R 3.3.1)                    
##  gtable        0.2.0       2016-02-26 CRAN (R 3.3.0)                    
##  htmltools     0.3.5       2016-03-21 CRAN (R 3.3.0)                    
##  htmlwidgets   0.8         2016-11-09 CRAN (R 3.3.2)                    
##  httpuv        1.3.3       2015-08-04 CRAN (R 3.3.1)                    
##  ids           1.0.0       2016-11-04 CRAN (R 3.3.2)                    
##  jsonlite      1.2         2016-12-31 CRAN (R 3.3.2)                    
##  knitr         1.15.1      2016-11-22 CRAN (R 3.3.2)                    
##  lattice       0.20-34     2016-09-06 CRAN (R 3.3.2)                    
##  lazyeval      0.2.0.9000  2017-01-13 Github (hadley/lazyeval@c155c3d)  
##  magrittr      1.5         2014-11-22 CRAN (R 3.3.0)                    
##  memoise       1.0.0       2016-01-29 CRAN (R 3.3.2)                    
##  mime          0.5         2016-07-07 CRAN (R 3.3.2)                    
##  munsell       0.4.3       2016-02-13 CRAN (R 3.3.0)                    
##  openssl       0.9.6       2016-12-31 CRAN (R 3.3.2)                    
##  pkgbuild      0.0.0.9000  2016-11-15 Github (r-pkgs/pkgbuild@65eace0)  
##  pkgload       0.0.0.9000  2017-02-17 Github (r-pkgs/pkgload@fc907a1)   
##  plyr          1.8.4       2016-06-08 CRAN (R 3.3.0)                    
##  purrr         0.2.2       2016-06-18 CRAN (R 3.3.1)                    
##  R6            2.2.0       2016-10-05 CRAN (R 3.3.2)                    
##  rangl       * 0.4.0       2017-02-01 Github (r-gris/rangl@ec8ee25)     
##  raster        2.5-8       2016-06-02 CRAN (R 3.3.2)                    
##  Rcpp          0.12.9.3    2017-02-20 Github (RcppCore/Rcpp@8bf15c0)    
##  rgl           0.97.0      2017-01-10 CRAN (R 3.3.2)                    
##  rmarkdown     1.3.9003    2017-02-24 Github (rstudio/rmarkdown@cae9df8)
##  rprojroot     1.2         2017-01-16 CRAN (R 3.3.2)                    
##  RTriangle     1.6-0.8     2016-07-01 CRAN (R 3.3.1)                    
##  sc          * 0.0.1.9000  2017-02-26 local                             
##  scales        0.4.1       2016-11-09 CRAN (R 3.3.2)                    
##  scsf        * 0.0.2.9001  2017-02-26 local                             
##  sf          * 0.3-5       2017-02-24 Github (edzer/sfr@6279bd3)        
##  shiny         1.0.0       2017-01-12 CRAN (R 3.3.2)                    
##  sp            1.2-4       2016-12-22 CRAN (R 3.3.2)                    
##  spbabel       0.4.5.9001  2017-02-01 Github (mdsumner/spbabel@739c07c) 
##  stringi       1.1.2       2016-10-01 CRAN (R 3.3.1)                    
##  stringr       1.1.0       2016-08-19 CRAN (R 3.3.1)                    
##  tibble        1.2         2016-08-26 CRAN (R 3.3.2)                    
##  udunits2      0.13        2016-11-17 CRAN (R 3.3.2)                    
##  units         0.4-2       2017-01-13 CRAN (R 3.3.2)                    
##  unjoin        0.0.0.9000  2017-02-26 Github (mdsumner/unjoin@b5478ba)  
##  uuid          0.1-2       2015-07-28 CRAN (R 3.3.2)                    
##  viridis       0.3.4       2016-03-12 CRAN (R 3.3.1)                    
##  viridisLite   0.1.3       2016-03-12 CRAN (R 3.3.1)                    
##  withr         1.0.2       2016-06-20 CRAN (R 3.3.2)                    
##  xtable        1.8-2       2016-02-05 CRAN (R 3.3.0)                    
##  yaml          2.1.14      2016-11-12 CRAN (R 3.3.2)

A consistent common form for spatial data

Michael Sumner

2017-02-26