Spatial Data Structures with INLA and inlabru

Jafet Belmont

2022-03-21

Ecological studies usually involve the analysis of data that are geographically referenced. Over the last decade, different packages have been developed to handle geographic data in R with the sp package being for many years one of the key geospatial R packages for handling spatial data. However, new technologies have shown an important increase in the accuracy of georeferencing, leading to a more precise coordinate representation. In consequence, the code in which coordinate reference systems (CRS) are written has been updated. To have a better understanding of what these changes are and a bit of the history behind, I recommend the following reading (this document also explains some of the warning you get when working with the sp package):

https://psfaculty.plantsciences.ucdavis.edu/plant/additionaltopics_datumwarning.pdf

In summary, recent changes in GDAL and PROJ packages (and in consequence any package that depended on these ones) do not longer support using proj4strings to represent a CRS because it cannot hold all the details of a projection. Intead, the current approach to represent a CRS is with the WKT2 strings, a standardized text-based format developed by the Open Geospatial Consortium (OGC). However, coding the WKT2 manually can be difficult due to the complexity involved in the CRS specification. A common aproach to address this is to specify the EPSG code (European Petroleum Survey Group) which is a uniform mapping system that identifies each coordinate refence system and projections. As a consequence of all of these changes, the OGC developed the sf package as standardized way to encode spatial data that will envetualy replace the sp and rgeos packages. The sf has gained a lot of support and different resources and tutorials have become available. Here are some tutorials you might find useful to get you started with the sf library.

The sf package provides a more convenient and flexible framework for handling spatial data by defining more intuitive spatial data-structures compared to sp. However sf is somewhat new, and a lot packages still rely on the spatial classes defined by the sp package (such as INLA and inlabru). Since the sp is no longer supporeted, we should see an increasing shift towards sf in the nearby future. In the meantime, users have suggested to do most of the spatial operations in sf and convert them to a sp spatial class when needed.

Vignette Info

In the next few sections I will go throught some examples for handling spatial objects, CRS and projections using sf and sp spatial classes and their implementation with INLA and inlabru. I will fit a LGCP using inlabru to estimate the sampling intensity in the British Dragonfly Society Recording Scheme (2020). This case of study illustrates how different spatial strucutres interact within INLA and how the CRS and projections should be handled according to the new updated GDAL and PROJ libraries .

sp package

Reading a shapefile through sp or using pre-exiting data sets will typlically create a SpatialPolygonsDataFrame class object. I will upload a short guide about how to work with spatial objects in R I made a few years back that explain the different spatial structures implemented in the sp package (which now seems pointless but gives a good idea of why sf is more intuitive and integrates well with DSL such as ggplot and other tidyverse libraries).

I will use the raster library to retrieve the border of England which can be used to define a non-stationary model. Notice that the terra package, which is compatible with the new changes in GDAL and PROJ, has now been created as a replacement for the raster library. So I will only use the raster package to retrieve the data. For simplicifty, I will remove all of the smaller detached islands by using the ms_filter_islands function in the rmapshaper package (this package fully supports sf or sfc polygons object as well).

library(raster)
library(rmapshaper)
library(tidyr)

uk_mask <- getData('GADM', country='GBR', level=1)
England <- uk_mask[uk_mask$NAME_1 == "England",]
class(England)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"

England_mainland_sp <- ms_filter_islands(England, min_area = 12391399903)
plot(England_mainland_sp)

We can get the CRS of a Spatial object (i.e. using the WKT2 string format) in sp as follows:

cat(wkt(England_mainland_sp))
#> GEOGCRS["unknown",
#>     DATUM["World Geodetic System 1984",
#>         ELLIPSOID["WGS 84",6378137,298.257223563,
#>             LENGTHUNIT["metre",1]],
#>         ID["EPSG",6326]],
#>     PRIMEM["Greenwich",0,
#>         ANGLEUNIT["degree",0.0174532925199433],
#>         ID["EPSG",8901]],
#>     CS[ellipsoidal,2],
#>         AXIS["longitude",east,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]],
#>         AXIS["latitude",north,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]]]

We want to set the CRS to the British National Grid (BNG) which has an EPSG code of 27700. We will apply this CRS to all of our spatial objects to ensure they are all on the same CRS.

BNGproj<-  CRS(SRS_string = "EPSG:27700")  
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum OSGB 1936 in Proj4 definition
wkt_BNGproj <- wkt(BNGproj)
cat(wkt_BNGproj)
#> PROJCRS["OSGB 1936 / British National Grid",
#>     BASEGEOGCRS["OSGB 1936",
#>         DATUM["OSGB 1936",
#>             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
#>                 LENGTHUNIT["metre",1]]],
#>         PRIMEM["Greenwich",0,
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>         ID["EPSG",4277]],
#>     CONVERSION["British National Grid",
#>         METHOD["Transverse Mercator",
#>             ID["EPSG",9807]],
#>         PARAMETER["Latitude of natural origin",49,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8801]],
#>         PARAMETER["Longitude of natural origin",-2,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8802]],
#>         PARAMETER["Scale factor at natural origin",0.9996012717,
#>             SCALEUNIT["unity",1],
#>             ID["EPSG",8805]],
#>         PARAMETER["False easting",400000,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8806]],
#>         PARAMETER["False northing",-100000,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8807]]],
#>     CS[Cartesian,2],
#>         AXIS["(E)",east,
#>             ORDER[1],
#>             LENGTHUNIT["metre",1]],
#>         AXIS["(N)",north,
#>             ORDER[2],
#>             LENGTHUNIT["metre",1]],
#>     USAGE[
#>         SCOPE["Engineering survey, topographic mapping."],
#>         AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
#>         BBOX[49.75,-9,61.01,2.01]],
#>     ID["EPSG",27700]]

Then we can use the spTransform to assign this new projection to any spatial-class object defined by the sp library.

England_BNG = spTransform(England_mainland_sp, CRS=BNGproj)

In this particular case, England_mainland_sp was originally created from a pre-existing spatial-class object. However, we can also create a SpatialPointsDataFrame from the locations of our observations. The intensity data frame contains the location where different dragonflies and damselfies have been observed as part of the British Dragonfly Society Recording Scheme (2020).

#>        lat      long       OSGR Province
#> 1 51.93120 -0.830612     SP8026  England
#> 2 50.46988 -3.558427     SX8964  England
#> 3 51.40196 -0.187874   TQ261685  England
#> 4 50.54303 -4.824732 SW99957528  England
#> 5 51.40143 -0.182863     TQ2668  England
#> 6 50.81993 -0.390092     TQ1303  England

There are different ways of creating a SpatialPoints /SpatialPointsDataFrame in R. We could either (1) use the coordinates() function which transform our original non-spatial object into a spatial-class object, or (2) use the SpatialPoints() function to create objects of class SpatialPoints or SpatialPointsDataFrame from a set of coordinates and data.frames. I will illustrate the two different approaches in here:

coordinates(intensity) <-  ~ long + lat # this will set our original intensity object to be a SpatialPointsDataFrame class object
slot(intensity, "proj4string") <- CRS(SRS_string = "EPSG:4326") # assign the WGS 84 CRS
intensity<-spTransform(intensity,CRS= BNGproj) # project to the BNG

Alternatively,

intensity.BNG <- SpatialPoints(cbind(intensity$long,intensity$lat),
                                 proj4string= CRS(SRS_string = "EPSG:4326")) %>%
                 spTransform(CRS= BNGproj)
intensity.BNG
#> class       : SpatialPoints 
#> features    : 13435 
#> extent      : 182215, 655500, 35000.04, 653500  (xmin, xmax, ymin, ymax)
#> crs         : +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs

Notice that I used the WGS 84 (World Geodetic System 1984 - EPSG code:4326) to set the original CRS in which the points where measured and then reproject them to the BNG as we did with the England border. We can check if both CRS are the same as follows:

identicalCRS(England_BNG,intensity.BNG)
#> [1] TRUE

As an extra step, I removed any points that fell outside the boundary and then plot them along with England’s boundary.

# remove any observation outside the boundary
intensity_subset <- intensity.BNG[England_BNG, ]
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

We have now the elements we need to build a mesh that we can use to fit our model. Ideally, the specification of the mesh should not treat the boundaries between England, Wales and Scotland as a coastline. But this is just a simplification of what could be modeled for the whole island (Great Britain). Mesh construction involving the smaller detached Islands could also be addressed in the upcoming Mesh construction tutorial.

The inla.sp2segment() can be used to specify the boundary and interior constraint edges for building up the mesh. This is the first function that depends on sp spatial-class objects. Thus, it is important to convert any other objects (including sf spatial class) to a sp spatial-class object such as SpatialPolygons, SpatialLines,SpatialPolygonsDataFrame.

library(INLA)
england.bdry <- inla.sp2segment(England_BNG)
class(england.bdry)
#> [1] "inla.mesh.segment"
str(england.bdry)
#> List of 5
#>  $ loc   : num [1:88304, 1:2] 565125 565125 565086 565089 565050 ...
#>  $ idx   : int [1:88304, 1:2] 1 88304 88303 88302 88301 88300 88299 88298 88297 88296 ...
#>  $ grp   : int [1:88304, 1] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ is.bnd: logi FALSE
#>  $ crs   :Formal class 'CRS' [package "sp"] with 1 slot
#>   .. ..@ projargs: chr "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"
#>   .. ..$ comment: chr "PROJCRS[\"OSGB 1936 / British National Grid\",\n    BASEGEOGCRS[\"OSGB 1936\",\n        DATUM[\"OSGB 1936\",\n "| __truncated__
#>  - attr(*, "class")= chr "inla.mesh.segment"

The england.bdry is a inla.mesh.segment-class object that contains the CRS associated with the spatial-class object used to define it (BNG in this case). However, if we build the mesh and DO NOT specify the CRS, the mesh won’t be have any CRS associated with it and will result in unexpected warnings or errors when trying to predict on a smaller grid that has a CRS. You will usually get a bunch of warnings (see https://rsbivand.github.io/ECS530_h20/ECS530_III.html and https://psfaculty.plantsciences.ucdavis.edu/plant/additionaltopics_datumwarning.pdf for details on why you get these warnings and how to mute them). Unfortunately, this is something we will need to hold on for the moment while INLA source code migrates to the sf environment somewhere in the future (or maybe something inlabru can take over?).

max.edge = 20000
mesh = inla.mesh.2d(boundary = england.bdry, 
                    offset = 50000,
                    max.edge = c(1.5, 2.5) * max.edge,
                    cutoff = 15000,
                    crs = BNGproj)
plot(mesh)

At this stage we can now fit our LGCP. In principle we do no have to worry about CRS here. However, it is true that the choice of priors is closely related to the spatial scale of the study and the units (e.g. meters or km) in which the data has been projected. So, the CRS projection should be handled properly, for which I suggest using the sf environment first to do the projections or change units and then convert the appropriate object to a sp spatial class.

library(inlabru)
matern <- inla.spde2.pcmatern(mesh,
                              prior.sigma = c(1, 0.01),
                              prior.range = c(5, 0.01)) 


cmp <- coordinates ~  spat(main = coordinates, model = matern) + Intercept(1)

fit = lgcp(cmp, data = intensity_subset,samplers = England_BNG,domain = list(coordinates = mesh) )
plot(fit, "Intercept")

I will not go into the modelling detail as this will be probably be covered on a different session. So I’ll just focus on some of the function that depend on the different spatial-structure and CRS. For example, if we wanted to predict the intensity, inlabru uses the posterior samples to evaluate on the fine grid. In this example we will use pixels()function from inlabru to create a SpatialPixels class-object over the mesh we’ve just created. (Notice that if you did not specify any CRS for your mesh, an error will occur if the Spatial object you are using as a mask contains a CRS attribute).

pxl<- pixels(mesh, mask = England_BNG,nx = 150,ny = 150)
plot(geometry(pxl))

pred <- predict(fit,pxl,
                formula ~ data.frame(
    lambda = exp(spat + Intercept),
    loglambda = spat + Intercept
  )
)

ggplot() +
  gg(pred$lambda) +
  gg(England_BNG) +
  ggtitle("LGCP fit to Points", subtitle = "(Response Scale)") +
  coord_fixed()
#> Regions defined for each Polygons

sf package

Now we will take a look at the different steps we took above using the sf environment instead.

Any shape file in your system can be read direct using the st_read(). In this example, I’ll convert the England SpatialPolygonsDataFrame to a simple feature object that we can manipulate and visualize within the tidyverse DSLs. The CRS for spatial objects of class sf or stars can be retrieved using the st_crs function , or be set or changed via st_set_crs using pipeline command (notice that simply replacing the CRS does not reproject the data, we should use st_transform for this means).

In the code below I:

# build an sf object
England_sf = st_as_sf(England) %>%  st_transform(crs = 27700)
England_sf = st_transform(England_sf,gsub("units=m","units=km",st_crs(England_sf)$proj4string)) 

England_mainland <- ms_filter_islands(England_sf, min_area = 2000)
England_mainland
#> Simple feature collection with 1 feature and 10 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 134.0774 ymin: 11.09554 xmax: 655.6956 ymax: 656.7911
#> CRS:           +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs
#>   GID_0         NAME_0   GID_1  NAME_1 VARNAME_1 NL_NAME_1
#> 1   GBR United Kingdom GBR.1_1 England      <NA>      <NA>
#>                            TYPE_1 ENGTYPE_1 CC_1 HASC_1
#> 1 Home Nation|Constituent Country   Kingdom <NA>   <NA>
#>                         geometry
#> 1 POLYGON ((564.9785 102.5189...
ggplot()+
  geom_sf(data=England_mainland)

We can query information on the CRS and projection as follows:

#  retrieve the PROJ.4 attribute
st_crs(England_mainland)$proj4string
#> [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs"
# check whether  longitude-latitude projection is still being applied
st_is_longlat(England_mainland)
#> [1] FALSE
# Check the spatial units of our projection
st_crs(England_mainland)$units
#> [1] "km"

Now we define a simple feature object from our data points and re project it using the CRS we defined above (notice that Geometry type informs us about the “spatial class” in a similar manner to sp spatial class objects.


intensity.proj <- st_as_sf(x = intensity,                         
                  coords = c("long", "lat"),
                  crs = "+proj=longlat +datum=WGS84")%>%  st_transform(st_crs(England_mainland))

intensity_subset <- intensity.proj[England_mainland, ]
intensity_subset
#> Simple feature collection with 13180 features and 2 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 182.146 ymin: 35.5694 xmax: 654.6209 ymax: 653.4977
#> CRS:           +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs
#> First 10 features:
#>          OSGR Province                  geometry
#> 1      SP8026  England POINT (480.3931 226.5483)
#> 2      SX8964  England POINT (289.4176 64.56643)
#> 3    TQ261685  England POINT (526.0376 168.6053)
#> 4  SW99957528  England POINT (199.8838 75.34989)
#> 5      TQ2668  England POINT (526.3875 168.5553)
#> 6      TQ1303  England POINT (513.3894 103.5627)
#> 7      TF8843  England POINT (588.3791 343.5351)
#> 8    SD120813  England POINT (311.9634 481.3678)
#> 9      TF8844  England   POINT (588.379 344.535)
#> 10     TG0820  England POINT (608.3766 320.5378)

The final step is to transform sf-class objects to a sp spatial-structure. The we can use this object to produce the mesh and fit our model.

# Build the mesh

England_mainland_sp <- as(England_mainland, "Spatial")
intensity_subset_sp <-as(intensity_subset, "Spatial")

england.bdry <- England_mainland_sp %>% inla.sp2segment()

max.edge = 20
mesh = inla.mesh.2d(boundary = england.bdry, 
                    crs = st_crs(England_mainland),offset = 50,
                    max.edge = c(1.5, 2.5) * max.edge,
                    cutoff = 15)

plot(mesh)


matern <- inla.spde2.pcmatern(mesh,
                              prior.sigma = c(1, 0.01),
                              prior.range = c(0.5, 0.01)) 


cmp <- coordinates ~  spat(main = coordinates, model = matern) + Intercept(1)


fit = lgcp(cmp, data = intensity_subset_sp,samplers = England_mainland_sp,domain = list(coordinates = mesh) )

Recall that a SpatialPixelsDataFrame, created with the pixel function in inlabru, was used to make this predictions. I included a few lines to this function so it can work internally with both sp and sf spatial structures. Maybe functions such as inlabru::predict(), inla.mesh.segment() and inla.mesh.create() could incorporate some of this compatibility with sf in future updates.

pixels_sf <- function (mesh, nx = 150, ny = 150, mask = TRUE) 
{
  if (length(nx) == 1) {
    x <- seq(min(mesh$loc[, 1]), max(mesh$loc[, 1]), length = nx)
  }
  else {
    x <- nx
  }
  if (length(ny) == 1) {
    y <- seq(min(mesh$loc[, 2]), max(mesh$loc[, 2]), length = ny)
  }
  else {
    y <- ny
  }
  pixels <- expand.grid(x = x, y = y)
  pixels_sf<- st_as_sf(x = pixels,                         
                       coords = c("x", "y"))
  # old sp specification
  #coordinates(pixels) <- c("x", "y")
  
  if (!is.null(mesh$crs)) {
  # updated sp specification
  #slot(pixels, "proj4string") <- mesh$crs
  st_crs(pixels_sf) <- mesh$crs
  }
  if (is.logical(mask)) {
    if (mask) {
      pixels <- pixels[is.inside(mesh, coordinates(pixels))]
    }
  }
  else {
    
    if (inherits(mask, "SpatialPolygonsDataFrame")) {
      mask <- as(mask, "SpatialPolygons")
      mask <- st_as_sf(mask)
      
    } else if ( inherits(mask, "sf")){
      mask <- mask
    }
    pixels <- st_filter(pixels_sf, mask)
    # old sp clipping  function
    #pixels <- pixels[!is.na(sp::over(pixels, mask))]
  }
  
  
  pixels <- as(pixels, "Spatial")
  #pixels <- as(pixels, "SpatialPixelsDataFrame")
  pixels <- sp::SpatialPixelsDataFrame(pixels, data = data.frame(matrix(nrow = NROW(pixels), ncol = 0)))
  pixels
}

pxl<- pixels_sf(mesh,mask = England_mainland,nx = 300,ny = 300) 

class(pxl)
#> [1] "SpatialPixelsDataFrame"
#> attr(,"package")
#> [1] "sp"

pred <- predict(fit,pxl,
                formula ~ data.frame(
    lambda = exp(spat + Intercept),
    loglambda = spat + Intercept
  )
)

ggplot() +
  gg(pred$lambda) +
  ggtitle("LGCP fit to Points", subtitle = "(Response Scale)") +
  coord_fixed()