The SPDE approach relies discretizing the space by defining a mesh
that creates an artificial set of neighbours over the study area that
allows for the spatial autocorrelation between observation to be
calculated. How the mesh is constructed will have an important impact on
the inference and predictions we make. Thus it is important to create a
good mesh to ensure results are not sensible to the mesh itself. While
the mesh construction vary from case to case, there have been some
guidelines to create an optimal mesh. There are several arguments that
can be used to build the mesh. This vignette will only cover a
two-dimensional mesh construction using the
inla.mesh.2d.
function. However, a one-dimensional
mesh specification can be created using the
inla.mesh.1d
function (I intend to cover this in
future tutorials for spatio-temporal models). The arguments for a
two-dimensional mesh construction are the followings:
args(inla.mesh.2d)
function (loc = NULL, loc.domain = NULL, offset = NULL, n = NULL,
boundary = NULL, interior = NULL, max.edge, min.angle = NULL,
cutoff = 1e-12, plot.delay = NULL)
First, some reference about the study region is needed, which can be provided by either:
loc
argument.loc.domain
argument.boundary
argument.Note that if either (1) the location of points or (2) the domain
extent are specified, the mesh will be constructed based on a convex
hull (a polygon of triangles out of the domain area). Alternatively, it
possible to include a non-convex hull as a boundary in the mesh
construction instead of the location or loc.domain arguments. This will
result in the triangulation to be constrained by the boundary. A
non-convex hull mesh can also be created by building a boundary for the
points using the inla.nonconvex.hull()
function.
Finally, the other compulsory argument that needs to be specified is the
max.edge
which determines the largest allowed
triangle length (the lower the value for max.edge
the higher the resolution). The value supplied to this argument can be
either a scalar, in which case the value controls the triangle edge
lengths in the inner domain, or a length two vector that controls the
edge lengths in the inner domain and in the outer extension
respectively. Notice that The value (or values) passed to the
max.edge
function must be on the same scale unit
as the coordinates. To illustrate the different options when building a
mesh I will use the number of dragonflies records on the British
Dragonfly Society Recording Scheme (2020) in the west coast of
England.
#> Loading required package: sp
#> Registered S3 method overwritten by 'dplyr':
#> method from
#> print.location geojsonlint
#>
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:raster':
#>
#> extract
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
#> Loading required package: foreach
#> Loading required package: parallel
#> This is INLA_22.02.28-1 built 2022-02-28 17:18:16 UTC.
#> - See www.r-inla.org/contact-us for how to get help.
First, we will create a simple mesh using the locations where species
have been observed. These can be supplied either as a spatial point
sp
-class object or as a vector of coordinates.
Since the spatial points have been defined as a sf
(simple feature) object we can (1) convert them to a
sp
-class object (e.g. by writing
as(intensity_sf, "Spatial")
) or extract the
coordinates using the st_coordinates
function.
While there is no general rule for setting a correct value for the
max.edge
, a value for
max.edge
that is too close to the spatial range
will make the task of fitting a smooth SPDE difficult. On the other
hand, if the max.edge
value is too small compared
to the spatial range, the mesh will have a large number of vertices
leading to a more computationally demanding fitting process (which might
not necessarily lead to better results). Thus, it is better to begin the
analysis with a coarse matrix and evaluate the model on a finer grid as
a final step. Several studies, tutorials and group discussions have
suggested the max.edge
value to be between 1/3 to
1/10 times smaller than the spatial range (https://haakonbakkagit.github.io/btopic104.html).
However, since the range cannot be known until the model has been fitted
we can make a somewhat conservative assumption that it is about 1/3 of
the study area and thus specify the max.edge
value
to be 1/5 of the spatial range (notice that the range parameter in INLA
GMRF is the distance at which the correlation drops to about 0.13).
= diff(range(st_coordinates(intensity_sf)[,1]))/(3*5)
max.edge
= inla.mesh.2d(loc=st_coordinates(intensity_sf),
mesh1 max.edge = max.edge)
ggplot() +
geom_sf(data=England_crop,color='turquoise',fill='transparent')+
gg(mesh1) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
Notice that the problem with this mesh is that there is a rather large
number of points too close to the boundary. This can cause a boundary
effect in which the variance is twice as larger at the border than
within the domain. Thus, we can also specify an outer layer with a lower
triangle density (i.e. where no points occur) to avoid this boundary
effect. This can be done by supplying a vector of two values so that the
spatial domain is divided into an inner and an outer area. Here, we will
define the
max.edge
such that the outer layer will
have a triangle density two times lower than than the inner layer
(i.e. twice the length for the outer layer edges).
= inla.mesh.2d(loc=st_coordinates(intensity_sf),
mesh2 max.edge = c(1,2)*max.edge)
ggplot() +
gg(mesh2) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
By doing this, we avoid boundary effects by extending the original
spatial domain without increasing too much the computational costs. If
we had defined the same triangle density in both inner and outer layers,
then we would have been wasting too much computational effort to get
precise approximations in the outer extension where no actual points
occur. Lindgren and Rue (2015), suggest for the domain to be extended by
a distance at least equal to the range to avoid the boundary effect. The
amount in which the domain should be extended in the inner and outer
part can be controlled with the offset
argument of
the inla.mesh.2d
function. For this example we
will expand the inner layer by the same amount as the
max.edge
and the outer layer by the range we
assumed when defining the inner max.edge
value
(i.e. 1/3 of the spatial extent).
= diff(range(st_coordinates(intensity_sf)[,1]))/3
bound.outer
= inla.mesh.2d(loc=st_coordinates(intensity_sf),
mesh3 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer))
ggplot() +
gg(mesh3) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
Another potential problem we need to sort out occurs when the nodes
in the mesh are too close to each other. This can happen because the
mesh tries to put one node at each location supplied in
loc
argument. When species occurrences in this
case, are recorded at locations are very close to each other, a mesh
with a large number of vertices will be created at the location where
species occur. We can change the cutoff
value to
avoid building too many small triangles around clustered data points.
The cutoff
value is shortest allowed distance
between points because points with a distance less than the cutoff will
be considered a single vertex. For this example, I will replace points
with distance less than 1 km by a single vertex. In practice, some
authors have suggested to begin with a cutoff equal to 1/5 of the
max.edge
value of the inner domain and make
adjustments as necessary.
= inla.mesh.2d(loc=st_coordinates(intensity_sf),
mesh4 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
ggplot() +
gg(mesh4) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
Righettoa et al (2018) investigated the impact of different
mesh specifications on parameter estimation and prediction through a
simulation study. From the different arguments used to build a mesh they
found that the cutoff
and maximal edge length in
the inner domain (conditional on the cutoff) had the largest impact on
the results compared to the maximal edge length in the outer domain
which had little effect on this.
Another way to create convex hull meshes is to supply a single
polygon to determine the domain extent using the
loc.domain
argument. With this approach, we can
change the n
argument to set the initial number of
points in the extended boundary to have triangles as regular as possible
in size and shape. As an example, I will create a bounding box to define
a polygon that cover the locations where species have been observed in
the study area and vary the number of points in the boundary .
= st_bbox(intensity_sf) %>% st_as_sfc()
pol
= inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
mesh5 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
= inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
mesh6 n=7,
max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
= inla.mesh.2d(loc.domain = st_coordinates(pol)[,1:2],
mesh7 n=9,
max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
A problem with the meshes we have constructed so far, is that domain
extends past the coastline into the sea. This implies that the field
exists also over the sea. Since dragonflies do not occur beyond the
coastline, we would want to avoid including any part of the sea in the
mesh so the Gaussian field doesn’t smooths over the sea (the same
reasoning applies for aquatic species that exit only in the sea or
waterbodies). The solution to this is to construct a mesh that take this
boundaries into consideration. This is achieve by creating a non-convex
hull. In some situation building a mesh based on a non-convex hull can
avoid having many small triangles outside the domain of interest.
Boundaries can be creating around the locations of points using the
inla.nonconvex.hull
function. The parameters to
control the shape of the boundary (i.e. convexity, concavity and
resolution) are clearly explained in:
This type of mesh can be helpful for instance, when modelling points that occur in separate islands from the mainland.
<- inla.nonconvex.hull(as.matrix(st_coordinates(intensity_sf)),
domain concave = -.07,convex = -0.05, resolution=c(100,100))
<- inla.mesh.2d(boundary = domain,
mesh8 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
ggplot() +
gg(mesh8) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
In a lot of situation, the domain region is available as a map and thus, it is more convenient to use the boarders of the map as boundaries. In the next example I will illustrate a mesh creation including the coastline of West-England map.
To specify the coastline polygon as a boundary in the mesh we first
need to create a sp
SpatialPolygons
/SpatialPolygonsDataFrame
-class
object (e.g. if the original shapefile or polygon is a
sf
-class object you will need to convert it to a
sp
-clas ). Then, we can use the
inla.sp2segment()
function to extract the boundary
of the SpatialPolygons and supplied this in the
boundary
argument of the
inla.mesh.2d
function.
<- as(England_crop, "Spatial") %>% inla.sp2segment()
england.bdry
<- inla.mesh.2d(boundary = england.bdry,
mesh9 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff = max.edge/5)
ggplot() +
gg(mesh9) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
There are a few considerations that need to be kept in mind when
using a boarder to build a mesh. First, the cutoff
value is no longer affected by the distance between points but the
boundary polygon itself. Thus, we might need to reduce
cutoff
value to achieve a higher the precision of
the coastline.
<- inla.mesh.2d(boundary = england.bdry,
mesh10 max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff =0.5)
# number of nodes
$n
mesh10#> [1] 1084
ggplot() +
gg(mesh10) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
In the previous mesh, locations were not always at the mesh nodes.
So, you can make each location a node in the mesh by supplying the
coordinates of each points in the loc
argument as
we did with the convex hull meshes before. This of course will increase
the number of nodes or triangles in the inner layer and therefore
increasing the computational cost.
<- inla.mesh.2d(boundary = england.bdry,
mesh11 loc=st_coordinates(intensity_sf),
max.edge = c(1,2)*max.edge,
offset=c(max.edge, bound.outer),
cutoff =0.5)
$n
mesh11#> [1] 1191
ggplot() +
gg(mesh11) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
However, as mentioned earlier, it is better to begin the analysis
with a coarse mesh and then update the mesh parameters according to the
estimated range (e.g. let the max.edge
to be
between 1/5 and 1/10 of the estimated range and then update the
offset
and cutoff
values
accordingly). Lets try increasing the cutoff and the maximum edge length
in the outer layer.
<- inla.mesh.2d(boundary = england.bdry,
mesh12 loc=st_coordinates(intensity_sf),
max.edge = c(1,3)*max.edge,
offset=c(max.edge, bound.outer),
cutoff =0.65,
crs=st_crs(England_crop))
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=geocent +R=1
#> +units=m +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
$n
mesh12#> [1] 863
ggplot() +
gg(mesh12) +
geom_sf(data=intensity_sf,col='purple',size=1.7,alpha=0.5)
To define a Barrier Model (i.e. a non-stationary model) we need to
define a Polygon covering the study area. We do this by selecting the
triangles of the mesh that are inside the polygon of interest with the
inla.over_sp_mesh()
function. Then, the land
triangles from all triangles are removed to obtain the barrier ones
(here named barrier.tri()
). Finally, we can create
and visualize the barrier by building a polygon containing all pieces of
land using the inla.barrier.polygon()
function.
= inla.over_sp_mesh(as(England_crop, "Spatial"), y = mesh12,
land.tri type = "centroid", ignore.CRS = TRUE) # note that this function recieves a sp-spatialPolygon object
= length(mesh12$graph$tv[, 1])
num.tri = setdiff(1:num.tri, land.tri)
barrier.tri = inla.barrier.polygon(mesh12,
poly.barrier barrier.triangles = barrier.tri)
ggplot()+
gg(mesh12)+
gg(poly.barrier,alpha=0.7)