Building the mesh

Jafet Belmont

2022-04-05

The SPDE approach relies discretizing the space by defining a mesh that creates an artificial set of neighbours over the study area that allows for the spatial autocorrelation between observation to be calculated. How the mesh is constructed will have an important impact on the inference and predictions we make. Thus it is important to create a good mesh to ensure results are not sensible to the mesh itself. While the mesh construction vary from case to case, there have been some guidelines to create an optimal mesh. There are several arguments that can be used to build the mesh. This vignette will only cover a two-dimensional mesh construction using the inla.mesh.2d. function. However, a one-dimensional mesh specification can be created using the inla.mesh.1d function (I intend to cover this in future tutorials for spatio-temporal models). The arguments for a two-dimensional mesh construction are the followings:

args(inla.mesh.2d)
function (loc = NULL, loc.domain = NULL, offset = NULL, n = NULL,
boundary = NULL, interior = NULL, max.edge, min.angle = NULL,
cutoff = 1e-12, plot.delay = NULL)

First, some reference about the study region is needed, which can be provided by either:

Note that if either (1) the location of points or (2) the domain extent are specified, the mesh will be constructed based on a convex hull (a polygon of triangles out of the domain area). Alternatively, it possible to include a non-convex hull as a boundary in the mesh construction instead of the location or loc.domain arguments. This will result in the triangulation to be constrained by the boundary. A non-convex hull mesh can also be created by building a boundary for the points using the inla.nonconvex.hull() function. Finally, the other compulsory argument that needs to be specified is the max.edge which determines the largest allowed triangle length (the lower the value for max.edge the higher the resolution). The value supplied to this argument can be either a scalar, in which case the value controls the triangle edge lengths in the inner domain, or a length two vector that controls the edge lengths in the inner domain and in the outer extension respectively. Notice that The value (or values) passed to the max.edge function must be on the same scale unit as the coordinates. To illustrate the different options when building a mesh I will use the number of dragonflies records on the British Dragonfly Society Recording Scheme (2020) in the west coast of England.

#> Loading required package: sp
#> Registered S3 method overwritten by 'dplyr':
#>   method         from       
#>   print.location geojsonlint
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:raster':
#> 
#>     extract
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> Loading required package: foreach
#> Loading required package: parallel
#> This is INLA_22.02.28-1 built 2022-02-28 17:18:16 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.

First, we will create a simple mesh using the locations where species have been observed. These can be supplied either as a spatial point sp-class object or as a vector of coordinates. Since the spatial points have been defined as a sf (simple feature) object we can (1) convert them to a sp-class object (e.g. by writing as(intensity_sf, "Spatial")) or extract the coordinates using the st_coordinates function. While there is no general rule for setting a correct value for the max.edge, a value for max.edge that is too close to the spatial range will make the task of fitting a smooth SPDE difficult. On the other hand, if the max.edge value is too small compared to the spatial range, the mesh will have a large number of vertices leading to a more computationally demanding fitting process (which might not necessarily lead to better results). Thus, it is better to begin the analysis with a coarse matrix and evaluate the model on a finer grid as a final step. Several studies, tutorials and group discussions have suggested the max.edge value to be between 1/3 to 1/10 times smaller than the spatial range (https://haakonbakkagit.github.io/btopic104.html). However, since the range cannot be known until the model has been fitted we can make a somewhat conservative assumption that it is about 1/3 of the study area and thus specify the max.edge value to be 1/5 of the spatial range (notice that the range parameter in INLA GMRF is the distance at which the correlation drops to about 0.13).

max.edge = diff(range(st_coordinates(intensity_sf)[,1]))/(3*5)

mesh1 = inla.mesh.2d(loc=st_coordinates(intensity_sf),
                    max.edge = max.edge)
ggplot() +
     geom_sf(data=England_crop,color='turquoise',fill='transparent')+  
  gg(mesh1) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5) 

Notice that the problem with this mesh is that there is a rather large number of points too close to the boundary. This can cause a boundary effect in which the variance is twice as larger at the border than within the domain. Thus, we can also specify an outer layer with a lower triangle density (i.e. where no points occur) to avoid this boundary effect. This can be done by supplying a vector of two values so that the spatial domain is divided into an inner and an outer area. Here, we will define the max.edge such that the outer layer will have a triangle density two times lower than than the inner layer (i.e. twice the length for the outer layer edges).


mesh2 = inla.mesh.2d(loc=st_coordinates(intensity_sf),
                    max.edge = c(1,2)*max.edge)
ggplot() +
  gg(mesh2) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

By doing this, we avoid boundary effects by extending the original spatial domain without increasing too much the computational costs. If we had defined the same triangle density in both inner and outer layers, then we would have been wasting too much computational effort to get precise approximations in the outer extension where no actual points occur. Lindgren and Rue (2015), suggest for the domain to be extended by a distance at least equal to the range to avoid the boundary effect. The amount in which the domain should be extended in the inner and outer part can be controlled with the offset argument of the inla.mesh.2d function. For this example we will expand the inner layer by the same amount as the max.edge and the outer layer by the range we assumed when defining the inner max.edge value (i.e. 1/3 of the spatial extent).

bound.outer = diff(range(st_coordinates(intensity_sf)[,1]))/3

mesh3 = inla.mesh.2d(loc=st_coordinates(intensity_sf),
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer))
ggplot() +
  gg(mesh3) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

Another potential problem we need to sort out occurs when the nodes in the mesh are too close to each other. This can happen because the mesh tries to put one node at each location supplied in loc argument. When species occurrences in this case, are recorded at locations are very close to each other, a mesh with a large number of vertices will be created at the location where species occur. We can change the cutoff value to avoid building too many small triangles around clustered data points. The cutoff value is shortest allowed distance between points because points with a distance less than the cutoff will be considered a single vertex. For this example, I will replace points with distance less than 1 km by a single vertex. In practice, some authors have suggested to begin with a cutoff equal to 1/5 of the max.edge value of the inner domain and make adjustments as necessary.


mesh4 = inla.mesh.2d(loc=st_coordinates(intensity_sf),
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)
ggplot() +
  gg(mesh4) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

Righettoa et al (2018) investigated the impact of different mesh specifications on parameter estimation and prediction through a simulation study. From the different arguments used to build a mesh they found that the cutoff and maximal edge length in the inner domain (conditional on the cutoff) had the largest impact on the results compared to the maximal edge length in the outer domain which had little effect on this.

Another way to create convex hull meshes is to supply a single polygon to determine the domain extent using the loc.domain argument. With this approach, we can change the n argument to set the initial number of points in the extended boundary to have triangles as regular as possible in size and shape. As an example, I will create a bounding box to define a polygon that cover the locations where species have been observed in the study area and vary the number of points in the boundary .


pol =   st_bbox(intensity_sf) %>%  st_as_sfc() 

mesh5 = inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)

mesh6 = inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
                    n=7,
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)

mesh7 = inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
                    n=9,
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)

A problem with the meshes we have constructed so far, is that domain extends past the coastline into the sea. This implies that the field exists also over the sea. Since dragonflies do not occur beyond the coastline, we would want to avoid including any part of the sea in the mesh so the Gaussian field doesn’t smooths over the sea (the same reasoning applies for aquatic species that exit only in the sea or waterbodies). The solution to this is to construct a mesh that take this boundaries into consideration. This is achieve by creating a non-convex hull. In some situation building a mesh based on a non-convex hull can avoid having many small triangles outside the domain of interest. Boundaries can be creating around the locations of points using the inla.nonconvex.hull function. The parameters to control the shape of the boundary (i.e. convexity, concavity and resolution) are clearly explained in:

This type of mesh can be helpful for instance, when modelling points that occur in separate islands from the mainland.

domain <- inla.nonconvex.hull(as.matrix(st_coordinates(intensity_sf)),
                              concave = -.07,convex = -0.05, resolution=c(100,100))

mesh8 <- inla.mesh.2d(boundary = domain,
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)

ggplot() +
  gg(mesh8) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

In a lot of situation, the domain region is available as a map and thus, it is more convenient to use the boarders of the map as boundaries. In the next example I will illustrate a mesh creation including the coastline of West-England map.

To specify the coastline polygon as a boundary in the mesh we first need to create a sp SpatialPolygons/SpatialPolygonsDataFrame-class object (e.g. if the original shapefile or polygon is a sf-class object you will need to convert it to a sp-clas ). Then, we can use the inla.sp2segment() function to extract the boundary of the SpatialPolygons and supplied this in the boundary argument of the inla.mesh.2d function.


england.bdry <-  as(England_crop, "Spatial") %>% inla.sp2segment()


mesh9 <- inla.mesh.2d(boundary = england.bdry,
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff = max.edge/5)

ggplot() +
  gg(mesh9) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

There are a few considerations that need to be kept in mind when using a boarder to build a mesh. First, the cutoff value is no longer affected by the distance between points but the boundary polygon itself. Thus, we might need to reduce cutoff value to achieve a higher the precision of the coastline.

mesh10 <- inla.mesh.2d(boundary = england.bdry,
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff =0.5)
# number of nodes
mesh10$n
#> [1] 1084

ggplot() +
  gg(mesh10) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

In the previous mesh, locations were not always at the mesh nodes. So, you can make each location a node in the mesh by supplying the coordinates of each points in the loc argument as we did with the convex hull meshes before. This of course will increase the number of nodes or triangles in the inner layer and therefore increasing the computational cost.

mesh11 <- inla.mesh.2d(boundary = england.bdry,
                      loc=st_coordinates(intensity_sf),
                    max.edge = c(1,2)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff =0.5)
mesh11$n
#> [1] 1191

ggplot() +
  gg(mesh11) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

However, as mentioned earlier, it is better to begin the analysis with a coarse mesh and then update the mesh parameters according to the estimated range (e.g. let the max.edge to be between 1/5 and 1/10 of the estimated range and then update the offset and cutoff values accordingly). Lets try increasing the cutoff and the maximum edge length in the outer layer.

mesh12 <- inla.mesh.2d(boundary = england.bdry,
                      loc=st_coordinates(intensity_sf),
                    max.edge = c(1,3)*max.edge,
                    offset=c(max.edge, bound.outer),
                    cutoff =0.65,
                    crs=st_crs(England_crop))
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=geocent +R=1
#> +units=m +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
mesh12$n
#> [1] 863

ggplot() +
  gg(mesh12) +
  geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)

Barrier Model

To define a Barrier Model (i.e. a non-stationary model) we need to define a Polygon covering the study area. We do this by selecting the triangles of the mesh that are inside the polygon of interest with the inla.over_sp_mesh() function. Then, the land triangles from all triangles are removed to obtain the barrier ones (here named barrier.tri()). Finally, we can create and visualize the barrier by building a polygon containing all pieces of land using the inla.barrier.polygon() function.


land.tri = inla.over_sp_mesh(as(England_crop, "Spatial"), y = mesh12, 
                              type = "centroid", ignore.CRS = TRUE) # note that this function recieves a sp-spatialPolygon object
num.tri = length(mesh12$graph$tv[, 1])
barrier.tri = setdiff(1:num.tri, land.tri)
poly.barrier = inla.barrier.polygon(mesh12, 
                                    barrier.triangles = barrier.tri)


ggplot()+
  gg(mesh12)+
  gg(poly.barrier,alpha=0.7)