Ecological studies usually involve the analysis of data that are
geographically referenced. Over the last decade, different packages have
been developed to handle geographic data in R with the
sp package being for many years one of the key
geospatial R packages for handling spatial data. However, new
technologies have shown an important increase in the accuracy of
georeferencing, leading to a more precise coordinate representation. In
consequence, the code in which coordinate reference systems (CRS) are
written has been updated. To have a better understanding of what these
changes are and a bit of the history behind, I recommend the following
reading (this document also explains some of the warning you get when
working with the sp package):
https://psfaculty.plantsciences.ucdavis.edu/plant/additionaltopics_datumwarning.pdf
In summary, recent changes in GDAL and
PROJ packages (and in consequence any package that
depended on these ones) do not longer support using
proj4strings to represent a CRS because it cannot
hold all the details of a projection. Intead, the current approach to
represent a CRS is with the WKT2 strings, a
standardized text-based format developed by the Open Geospatial
Consortium (OGC). However, coding the WKT2
manually can be difficult due to the complexity involved in the CRS
specification. A common aproach to address this is to specify the EPSG
code (European Petroleum Survey Group) which is a uniform mapping system
that identifies each coordinate refence system and projections. As a
consequence of all of these changes, the OGC developed the
sf package as standardized way to encode spatial
data that will envetualy replace the sp and
rgeos packages. The sf has
gained a lot of support and different resources and tutorials have
become available. Here are some tutorials you might find useful to get
you started with the sf library.
The sf package provides a more convenient and
flexible framework for handling spatial data by defining more intuitive
spatial data-structures compared to sp. However
sf is somewhat new, and a lot packages still rely
on the spatial classes defined by the sp package
(such as INLA and inlabru).
Since the sp is no longer supporeted, we should
see an increasing shift towards sf in the nearby
future. In the meantime, users have suggested to do most of the spatial
operations in sf and convert them to a
sp spatial class when needed.
In the next few sections I will go throught some examples for
handling spatial objects, CRS and projections using
sf and sp spatial classes
and their implementation with INLA and
inlabru. I will fit a LGCP using
inlabru to estimate the sampling intensity in the
British Dragonfly Society Recording Scheme (2020). This case of study
illustrates how different spatial strucutres interact within
INLA and how the CRS and projections should be
handled according to the new updated GDAL and
PROJ libraries .
sp packageReading a shapefile through sp or using
pre-exiting data sets will typlically create a
SpatialPolygonsDataFrame class object. I will
upload a short guide about how to work with spatial objects in R I made
a few years back that explain the different spatial structures
implemented in the sp package (which now seems
pointless but gives a good idea of why sf is more
intuitive and integrates well with DSL such as ggplot and other
tidyverse libraries).
I will use the raster library to retrieve the
border of England which can be used to define a non-stationary model.
Notice that the terra package, which is compatible
with the new changes in GDAL and
PROJ, has now been created as a replacement for
the raster library. So I will only use the raster
package to retrieve the data. For simplicifty, I will remove all of the
smaller detached islands by using the
ms_filter_islands function in the
rmapshaper package (this package fully supports sf
or sfc polygons object as well).
library(raster)
library(rmapshaper)
library(tidyr)
uk_mask <- getData('GADM', country='GBR', level=1)
England <- uk_mask[uk_mask$NAME_1 == "England",]
class(England)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"
England_mainland_sp <- ms_filter_islands(England, min_area = 12391399903)
plot(England_mainland_sp)We can get the CRS of a Spatial object (i.e. using the WKT2 string
format) in sp as follows:
cat(wkt(England_mainland_sp))
#> GEOGCRS["unknown",
#> DATUM["World Geodetic System 1984",
#> ELLIPSOID["WGS 84",6378137,298.257223563,
#> LENGTHUNIT["metre",1]],
#> ID["EPSG",6326]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433],
#> ID["EPSG",8901]],
#> CS[ellipsoidal,2],
#> AXIS["longitude",east,
#> ORDER[1],
#> ANGLEUNIT["degree",0.0174532925199433,
#> ID["EPSG",9122]]],
#> AXIS["latitude",north,
#> ORDER[2],
#> ANGLEUNIT["degree",0.0174532925199433,
#> ID["EPSG",9122]]]]We want to set the CRS to the British National Grid (BNG) which has an EPSG code of 27700. We will apply this CRS to all of our spatial objects to ensure they are all on the same CRS.
BNGproj<- CRS(SRS_string = "EPSG:27700")
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum OSGB 1936 in Proj4 definition
wkt_BNGproj <- wkt(BNGproj)
cat(wkt_BNGproj)
#> PROJCRS["OSGB 1936 / British National Grid",
#> BASEGEOGCRS["OSGB 1936",
#> DATUM["OSGB 1936",
#> ELLIPSOID["Airy 1830",6377563.396,299.3249646,
#> LENGTHUNIT["metre",1]]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433]],
#> ID["EPSG",4277]],
#> CONVERSION["British National Grid",
#> METHOD["Transverse Mercator",
#> ID["EPSG",9807]],
#> PARAMETER["Latitude of natural origin",49,
#> ANGLEUNIT["degree",0.0174532925199433],
#> ID["EPSG",8801]],
#> PARAMETER["Longitude of natural origin",-2,
#> ANGLEUNIT["degree",0.0174532925199433],
#> ID["EPSG",8802]],
#> PARAMETER["Scale factor at natural origin",0.9996012717,
#> SCALEUNIT["unity",1],
#> ID["EPSG",8805]],
#> PARAMETER["False easting",400000,
#> LENGTHUNIT["metre",1],
#> ID["EPSG",8806]],
#> PARAMETER["False northing",-100000,
#> LENGTHUNIT["metre",1],
#> ID["EPSG",8807]]],
#> CS[Cartesian,2],
#> AXIS["(E)",east,
#> ORDER[1],
#> LENGTHUNIT["metre",1]],
#> AXIS["(N)",north,
#> ORDER[2],
#> LENGTHUNIT["metre",1]],
#> USAGE[
#> SCOPE["Engineering survey, topographic mapping."],
#> AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
#> BBOX[49.75,-9,61.01,2.01]],
#> ID["EPSG",27700]]Then we can use the spTransform to assign this
new projection to any spatial-class object defined by the
sp library.
England_BNG = spTransform(England_mainland_sp, CRS=BNGproj)In this particular case, England_mainland_sp
was originally created from a pre-existing spatial-class object.
However, we can also create a SpatialPointsDataFrame from the locations
of our observations. The intensity data frame
contains the location where different dragonflies and damselfies have
been observed as part of the British Dragonfly Society Recording Scheme
(2020).
#> lat long OSGR Province
#> 1 51.93120 -0.830612 SP8026 England
#> 2 50.46988 -3.558427 SX8964 England
#> 3 51.40196 -0.187874 TQ261685 England
#> 4 50.54303 -4.824732 SW99957528 England
#> 5 51.40143 -0.182863 TQ2668 England
#> 6 50.81993 -0.390092 TQ1303 England
There are different ways of creating a SpatialPoints
/SpatialPointsDataFrame in R. We could either (1) use the
coordinates() function which transform our
original non-spatial object into a spatial-class object, or (2) use the
SpatialPoints() function to create objects of
class SpatialPoints or SpatialPointsDataFrame from a set of coordinates
and data.frames. I will illustrate the two different approaches in
here:
coordinates(intensity) <- ~ long + lat # this will set our original intensity object to be a SpatialPointsDataFrame class object
slot(intensity, "proj4string") <- CRS(SRS_string = "EPSG:4326") # assign the WGS 84 CRS
intensity<-spTransform(intensity,CRS= BNGproj) # project to the BNGAlternatively,
intensity.BNG <- SpatialPoints(cbind(intensity$long,intensity$lat),
proj4string= CRS(SRS_string = "EPSG:4326")) %>%
spTransform(CRS= BNGproj)
intensity.BNG
#> class : SpatialPoints
#> features : 13435
#> extent : 182215, 655500, 35000.04, 653500 (xmin, xmax, ymin, ymax)
#> crs : +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defsNotice that I used the WGS 84 (World Geodetic System 1984 - EPSG code:4326) to set the original CRS in which the points where measured and then reproject them to the BNG as we did with the England border. We can check if both CRS are the same as follows:
identicalCRS(England_BNG,intensity.BNG)
#> [1] TRUEAs an extra step, I removed any points that fell outside the boundary and then plot them along with England’s boundary.
# remove any observation outside the boundary
intensity_subset <- intensity.BNG[England_BNG, ]#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
We have now the elements we need to build a mesh that we can use to fit our model. Ideally, the specification of the mesh should not treat the boundaries between England, Wales and Scotland as a coastline. But this is just a simplification of what could be modeled for the whole island (Great Britain). Mesh construction involving the smaller detached Islands could also be addressed in the upcoming Mesh construction tutorial.
The inla.sp2segment() can be used to specify
the boundary and interior constraint edges for building up the mesh.
This is the first function that depends on sp
spatial-class objects. Thus, it is important to convert any other
objects (including sf spatial class) to a sp
spatial-class object such as SpatialPolygons,
SpatialLines,SpatialPolygonsDataFrame.
library(INLA)
england.bdry <- inla.sp2segment(England_BNG)
class(england.bdry)
#> [1] "inla.mesh.segment"
str(england.bdry)
#> List of 5
#> $ loc : num [1:88304, 1:2] 565125 565125 565086 565089 565050 ...
#> $ idx : int [1:88304, 1:2] 1 88304 88303 88302 88301 88300 88299 88298 88297 88296 ...
#> $ grp : int [1:88304, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ is.bnd: logi FALSE
#> $ crs :Formal class 'CRS' [package "sp"] with 1 slot
#> .. ..@ projargs: chr "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"
#> .. ..$ comment: chr "PROJCRS[\"OSGB 1936 / British National Grid\",\n BASEGEOGCRS[\"OSGB 1936\",\n DATUM[\"OSGB 1936\",\n "| __truncated__
#> - attr(*, "class")= chr "inla.mesh.segment"The england.bdry is a
inla.mesh.segment-class object that contains the
CRS associated with the spatial-class object used to define it (BNG in
this case). However, if we build the mesh and DO NOT specify the CRS,
the mesh won’t be have any CRS associated with it and will result in
unexpected warnings or errors when trying to predict on a smaller grid
that has a CRS. You will usually get a bunch of warnings (see https://rsbivand.github.io/ECS530_h20/ECS530_III.html
and https://psfaculty.plantsciences.ucdavis.edu/plant/additionaltopics_datumwarning.pdf
for details on why you get these warnings and how to mute them).
Unfortunately, this is something we will need to hold on for the moment
while INLA source code migrates to the sf
environment somewhere in the future (or maybe something inlabru can take
over?).
max.edge = 20000
mesh = inla.mesh.2d(boundary = england.bdry,
offset = 50000,
max.edge = c(1.5, 2.5) * max.edge,
cutoff = 15000,
crs = BNGproj)
plot(mesh)
At this stage we can now fit our LGCP. In principle we do no have to
worry about CRS here. However, it is true that the choice of priors is
closely related to the spatial scale of the study and the units
(e.g. meters or km) in which the data has been projected. So, the CRS
projection should be handled properly, for which I suggest using the
sf environment first to do the projections or
change units and then convert the appropriate object to a
sp spatial class.
library(inlabru)
matern <- inla.spde2.pcmatern(mesh,
prior.sigma = c(1, 0.01),
prior.range = c(5, 0.01))
cmp <- coordinates ~ spat(main = coordinates, model = matern) + Intercept(1)
fit = lgcp(cmp, data = intensity_subset,samplers = England_BNG,domain = list(coordinates = mesh) )
plot(fit, "Intercept")
I will not go into the modelling detail as this will be probably be
covered on a different session. So I’ll just focus on some of the
function that depend on the different spatial-structure and CRS. For
example, if we wanted to predict the intensity,
inlabru uses the posterior samples to evaluate on
the fine grid. In this example we will use
pixels()function from
inlabru to create a SpatialPixels class-object
over the mesh we’ve just created. (Notice that if you did not specify
any CRS for your mesh, an error will occur if the Spatial object you are
using as a mask contains a CRS attribute).
pxl<- pixels(mesh, mask = England_BNG,nx = 150,ny = 150)
plot(geometry(pxl))pred <- predict(fit,pxl,
formula ~ data.frame(
lambda = exp(spat + Intercept),
loglambda = spat + Intercept
)
)
ggplot() +
gg(pred$lambda) +
gg(England_BNG) +
ggtitle("LGCP fit to Points", subtitle = "(Response Scale)") +
coord_fixed()
#> Regions defined for each PolygonsNow we will take a look at the different steps we took above using
the sf environment instead.
Any shape file in your system can be read direct using the
st_read(). In this example, I’ll convert the
England SpatialPolygonsDataFrame to a simple feature object that we can
manipulate and visualize within the tidyverse DSLs. The CRS for spatial
objects of class sf or stars
can be retrieved using the st_crs function , or be
set or changed via st_set_crs using pipeline
command (notice that simply replacing the CRS does not reproject the
data, we should use st_transform for this
means).
In the code below I:
st_transform() (Equivalent to
spTransform()) to project the original CRS using
the EPSG code for the BNG and change the units from meters to km by
accessing the PROJ.4 string attribute.geom_sf within ggplot.# build an sf object
England_sf = st_as_sf(England) %>% st_transform(crs = 27700)
England_sf = st_transform(England_sf,gsub("units=m","units=km",st_crs(England_sf)$proj4string))
England_mainland <- ms_filter_islands(England_sf, min_area = 2000)
England_mainland
#> Simple feature collection with 1 feature and 10 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 134.0774 ymin: 11.09554 xmax: 655.6956 ymax: 656.7911
#> CRS: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs
#> GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 NL_NAME_1
#> 1 GBR United Kingdom GBR.1_1 England <NA> <NA>
#> TYPE_1 ENGTYPE_1 CC_1 HASC_1
#> 1 Home Nation|Constituent Country Kingdom <NA> <NA>
#> geometry
#> 1 POLYGON ((564.9785 102.5189...
ggplot()+
geom_sf(data=England_mainland)We can query information on the CRS and projection as follows:
# retrieve the PROJ.4 attribute
st_crs(England_mainland)$proj4string
#> [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs"
# check whether longitude-latitude projection is still being applied
st_is_longlat(England_mainland)
#> [1] FALSE
# Check the spatial units of our projection
st_crs(England_mainland)$units
#> [1] "km"Now we define a simple feature object from our data points and re project it using the CRS we defined above (notice that Geometry type informs us about the “spatial class” in a similar manner to sp spatial class objects.
intensity.proj <- st_as_sf(x = intensity,
coords = c("long", "lat"),
crs = "+proj=longlat +datum=WGS84")%>% st_transform(st_crs(England_mainland))
intensity_subset <- intensity.proj[England_mainland, ]
intensity_subset
#> Simple feature collection with 13180 features and 2 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 182.146 ymin: 35.5694 xmax: 654.6209 ymax: 653.4977
#> CRS: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=km +no_defs
#> First 10 features:
#> OSGR Province geometry
#> 1 SP8026 England POINT (480.3931 226.5483)
#> 2 SX8964 England POINT (289.4176 64.56643)
#> 3 TQ261685 England POINT (526.0376 168.6053)
#> 4 SW99957528 England POINT (199.8838 75.34989)
#> 5 TQ2668 England POINT (526.3875 168.5553)
#> 6 TQ1303 England POINT (513.3894 103.5627)
#> 7 TF8843 England POINT (588.3791 343.5351)
#> 8 SD120813 England POINT (311.9634 481.3678)
#> 9 TF8844 England POINT (588.379 344.535)
#> 10 TG0820 England POINT (608.3766 320.5378)The final step is to transform sf-class objects
to a sp spatial-structure. The we can use this
object to produce the mesh and fit our model.
# Build the mesh
England_mainland_sp <- as(England_mainland, "Spatial")
intensity_subset_sp <-as(intensity_subset, "Spatial")
england.bdry <- England_mainland_sp %>% inla.sp2segment()
max.edge = 20
mesh = inla.mesh.2d(boundary = england.bdry,
crs = st_crs(England_mainland),offset = 50,
max.edge = c(1.5, 2.5) * max.edge,
cutoff = 15)
plot(mesh)
matern <- inla.spde2.pcmatern(mesh,
prior.sigma = c(1, 0.01),
prior.range = c(0.5, 0.01))
cmp <- coordinates ~ spat(main = coordinates, model = matern) + Intercept(1)
fit = lgcp(cmp, data = intensity_subset_sp,samplers = England_mainland_sp,domain = list(coordinates = mesh) )Recall that a SpatialPixelsDataFrame, created with the
pixel function in inlabru,
was used to make this predictions. I included a few lines to this
function so it can work internally with both sp
and sf spatial structures. Maybe functions such as
inlabru::predict(),
inla.mesh.segment() and
inla.mesh.create() could incorporate some of this
compatibility with sf in future updates.
pixels_sf <- function (mesh, nx = 150, ny = 150, mask = TRUE)
{
if (length(nx) == 1) {
x <- seq(min(mesh$loc[, 1]), max(mesh$loc[, 1]), length = nx)
}
else {
x <- nx
}
if (length(ny) == 1) {
y <- seq(min(mesh$loc[, 2]), max(mesh$loc[, 2]), length = ny)
}
else {
y <- ny
}
pixels <- expand.grid(x = x, y = y)
pixels_sf<- st_as_sf(x = pixels,
coords = c("x", "y"))
# old sp specification
#coordinates(pixels) <- c("x", "y")
if (!is.null(mesh$crs)) {
# updated sp specification
#slot(pixels, "proj4string") <- mesh$crs
st_crs(pixels_sf) <- mesh$crs
}
if (is.logical(mask)) {
if (mask) {
pixels <- pixels[is.inside(mesh, coordinates(pixels))]
}
}
else {
if (inherits(mask, "SpatialPolygonsDataFrame")) {
mask <- as(mask, "SpatialPolygons")
mask <- st_as_sf(mask)
} else if ( inherits(mask, "sf")){
mask <- mask
}
pixels <- st_filter(pixels_sf, mask)
# old sp clipping function
#pixels <- pixels[!is.na(sp::over(pixels, mask))]
}
pixels <- as(pixels, "Spatial")
#pixels <- as(pixels, "SpatialPixelsDataFrame")
pixels <- sp::SpatialPixelsDataFrame(pixels, data = data.frame(matrix(nrow = NROW(pixels), ncol = 0)))
pixels
}
pxl<- pixels_sf(mesh,mask = England_mainland,nx = 300,ny = 300)
class(pxl)
#> [1] "SpatialPixelsDataFrame"
#> attr(,"package")
#> [1] "sp"
pred <- predict(fit,pxl,
formula ~ data.frame(
lambda = exp(spat + Intercept),
loglambda = spat + Intercept
)
)
ggplot() +
gg(pred$lambda) +
ggtitle("LGCP fit to Points", subtitle = "(Response Scale)") +
coord_fixed()