As the geoR library is not currently available at CRAN, this notebook illustrates how to conduct spatial interpolation of point data using the stars, geostat and automap libraries.
The code written here was adapted from this book.
The processing steps include:
Interpolate using three methods:
Calculate an empirical variogram (OK and UK)
Fit a variogram model (OK and UK)
Evaluate interpolation accuracy using Leave-One-Out Cross Validation
Clean the R memory:
#uncomment the following line
#rm(list=ls())
Make sure to install the following libraries from the console: sf, stars, gstat, automap, leaflet, and leafem.
Then, load the libraries:
library(sf)
library(stars)
library(gstat)
library(automap)
library(leaflet)
library(leafem)
(boyaca <- st_read("./datos/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp"))
## Reading layer `MGN_MPIO_POLITICO' from data source
## `/Users/ivanlizarazo/Documents/GB/datos/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 123 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -74.66496 ymin: 4.655196 xmax: -71.94885 ymax: 7.055557
## Geodetic CRS: WGS 84
Previously, we downloaded climate data from TerraClimate and saved a geotif file with the target variable. Now, we read it:
archivo <- ("./datos/precip_boyaca_X2019.07.tif")
(precip <- read_stars(archivo))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## precip_boyaca_X2019.07.tif 18 66 102 125.0439 181 399
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 66 -74.6667 0.0416667 WGS 84 FALSE NULL [x]
## y 1 59 7.08333 -0.0416667 WGS 84 FALSE NULL [y]
Note the CRS for this raster as well as the cell size.
Let’s crop the precipitation layer to match the boundaries of our study area:
precip.mask <- st_crop(precip, boyaca)
precip
## stars object with 2 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## precip_boyaca_X2019.07.tif 18 66 102 125.0439 181 399
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 66 -74.6667 0.0416667 WGS 84 FALSE NULL [x]
## y 1 59 7.08333 -0.0416667 WGS 84 FALSE NULL [y]
Visualize the raster:
m <- leaflet() %>%
addTiles() %>%
leafem:::addGeoRaster(
precip.mask,
opacity = 0.7,
colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "blue"),
domain = 18:400)
)
#
m # Print the map
As explained in class, we need a sample dataset as input to the interpolation process. Here, we simulate such a dataset by using the available climate variable:
# raster to vector conversion
precip.pts <- st_as_sf(precip.mask, as_points = TRUE, merge = FALSE, long = TRUE)
What we got?
precip.pts
dt = sort(sample(1083, 1083*.7))
train<-precip.pts[dt,]
test<-precip.pts[-dt,]
train
longit <- st_coordinates(train)[,1]
latit <- st_coordinates(train)[,2]
rain <- train$precip_boyaca_X2019.07.tif
id <- seq(1,758,1)
(sitios <- data.frame(id, longit, latit, rain))
Let’s visualize the train sites:
m <- leaflet() %>%
addTiles() %>%
leafem:::addGeoRaster(
precip.mask,
opacity = 0.7,
colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "blue"),
domain = 15:400)
) %>%
addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$rain, clusterOptions = markerClusterOptions())
m # Print the map
In a previous practical, we downloaded a digital elevation model and saved it to disk. Now, we need it:
(dem = read_stars("./datos/dem_boyaca_1km.tif"))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## dem_boyaca_1km.tif 121 1512 2484 2253.167 3002 5140 66118
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 325 -74.6583 0.00833333 WGS 84 FALSE NULL [x]
## y 1 287 7.05 -0.00833333 WGS 84 FALSE NULL [y]
Note the CRS for this DEM as well as the cell size.
How does the elevation look?
pal.dem <- colorNumeric(palette = c("forestgreen", "green", "yellow", "brown", "lightcyan"), domain = 500:4000, na.color = "transparent")
m <- leaflet() %>%
addTiles() %>%
leafem:::addGeoRaster(
dem,
opacity = 0.7,
colorOptions = colorOptions(palette = c("forestgreen", "green", "yellow", "brown", "lightcyan"),
domain = 500:4000)
) %>%
addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$rain, clusterOptions = markerClusterOptions()) %>%
addLegend("bottomright", pal = pal.dem, values= dem$dem_boyaca_1km.tif,
title = "Elevation")
## Warning in pal(c(r[1], cuts, r[2])): Some values were outside the color scale
## and will be treated as NA
m # Print the map
We need to “join” elevation value corresponding to each train site:
# make sure the crs are identical
st_crs(dem) <- st_crs(train)
Now, extract the elevation values:
(train = st_join(train, st_as_sf(dem)))
and subset those sites that coincide with the raster:
train = train[!is.na(train$dem_boyaca_1km), ]
names(train) <- c("precip", "elev", "geometry")
train
To interpolate, we first need to create an object of class gstat, using a function of the same name: gstat.
A gstat object contains all necessary information to conduct spatial interpolation, namely:
Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:
We are going to use three parameters of the gstat function:
To interpolate the precipitation using the IDW method we create the following gstat object, specifying just the formula and data:
g1 = gstat(formula = precip ~ 1, data = train)
Now that our interpolation model g1 is defined, we can use the predict function to actually interpolate, i.e., to estimate precipitation values.
The predict function accepts:
The raster serves for two purposes:
For example, the following expression interpolates precipitation values according to the model defined in g1 and the raster template defined in dem:
## [inverse distance weighted interpolation]
z1 = predict(g1, dem)
## [inverse distance weighted interpolation]
What is z1?
z1
## stars object with 2 dimensions and 2 attributes
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## var1.pred 23 74.90023 100.8848 121.9123 175.9314 335 66118
## var1.var NA NA NA NaN NA NA 93275
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 325 -74.6583 0.00833333 WGS 84 NA NULL [x]
## y 1 287 7.05 -0.00833333 WGS 84 NA NULL [y]
We can subset just the first attribute and rename it to “precipitation”:
z1 = z1["var1.pred",,]
names(z1) = "precipitation"
The interpolated rainfall raster, using IDW, is shown in next figure:
b = seq(15, 400, 15)
plot(z1, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z1, breaks = b, add = TRUE)
Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data, and assign weights accordingly when making predictions.
As a first step, we can calculate and examine the empirical variogram using the variogram function.
The function requires two arguments:
For example, the following expression calculates the empirical variogram of train, with no covariates:
v_emp_ok = variogram(precip ~ 1, data=train)
Using plot to examine the variogram:
plot(v_emp_ok)
There are several ways to fit a variogram model to an empirical variogram. We will use the simplest one—automatic fitting using function autofitVariogram from package automap:
v_mod_ok = autofitVariogram(precip ~ 1, as(train, "Spatial"))
The function chooses the best fitting type of model, and also fine tunes its parameters. You may use show.vgms() to display variogram model types.
Note that the autofitVariogram function does not work on sf objects, which is why we convert the object to a SpatialPointsDataFrame (package sp).
The fitted model can be plotted with plot:
plot(v_mod_ok)
The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:
v_mod_ok$var_model
The variogram model can then be passed to the gstat function, and we can carry on with the Ordinary Kriging interpolation:
## [using ordinary kriging]
g2 = gstat(formula = precip ~ 1, model = v_mod_ok$var_model, data = train)
z2= predict(g2, dem)
## [using ordinary kriging]
Again, we will subset the predicted values attribute and rename it:
z2 = z2["var1.pred",,]
names(z2) = "precip"
The Ordinary Kriging predictions are shown in next figure:
b = seq(15, 400, 15)
plot(z2, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z2, breaks = b, add = TRUE)
Universal Kriging interpolation uses a model with one or more independent variables, i.e., covariates. The covariates need to be known for both:
The formula now specifies the name(s) of the covariate(s) to the right of the ~ symbol, separated by + if there are more than one. Also, we are using a subset of rainfall where elev_1km values were present:
v_emp_uk = variogram(precip ~ elev, train)
train.sp = as(train, "Spatial")
v_mod_uk = autofitVariogram(precip ~ elev, train.sp)
Comparing the Ordinary Kriging and Universal Kriging variogram models:
plot(v_emp_ok, model = v_mod_ok$var_model, ylim = c(0, 7000), main = "OK")
plot(v_emp_uk, model = v_mod_uk$var_model, ylim = c(0, 7000), main = "UK")
Next we create a gstat object, where the formula contains the covariate and the corresponding variogram model:
g3 = gstat(formula = precip ~ elev, model = v_mod_uk$var_model, data = train.sp)
Remember that all of the variables that appear in the formula need to be present in the data. In this case we have two variables: a dependent variable (precip) and an independent variable (elev).
Make sure that the dem object provides the independent variable (i.e. same name as in the fitted model):
names(dem)
## [1] "dem_boyaca_1km.tif"
names(dem) <- "elev"
Now we can make predictions:
## [using universal kriging]
z3 = predict(g3, dem)
## [using universal kriging]
and then subset and rename:
z3 = z3["var1.pred",,]
names(z3) = "precipitation"
Universal Kriging predictions are shown in the following figure:
b = seq(15, 400, 15)
plot(z3, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z3, breaks = b, add = TRUE)
Another view of the three interpolation outputs:
paleta <- colorNumeric(palette = c("orange", "yellow", "lightcyan", "blue", "darkblue"), domain = 10:400, na.color = "transparent")
colores <- colorOptions(palette = c("orange", "yellow", "lightcyan", "blue", "darkblue"), domain = 10:400, na.color = "transparent")
m <- leaflet() %>%
addTiles() %>%
addGeoRaster(z1, opacity = 0.7, colorOptions = colores, group="IDW") %>%
addGeoRaster(z2, colorOptions = colores, opacity = 0.7, group= "OK") %>%
addGeoRaster(z3, colorOptions = colores, opacity = 0.7, group= "UK") %>%
# Add layers controls
addLayersControl(
overlayGroups = c("UK", "OK", "IDW"),
options = layersControlOptions(collapsed = FALSE)
) %>%
addLegend("bottomright", pal = paleta, values= z1$precipitation,
title = "Precipitation (2019)"
)
m # Print the map
In your notebook, examine and compare the three results.
We have estimated climate surfaces using three different methods: IDW, Ordinary Kriging and Universal Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate interpolation accuracy. That way, we can objectively choose the most accurate method among the available interpolation methods.
In Leave-One-Out Cross Validation we:
In the end, what we get is a table with an observed value and a predicted value for all points.
We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.
When writing the following chunk, hide the results as explained in this link:
### run the following code from the console -- line-by-line
cv1 = gstat.cv(g1)
#cv2 = gstat.cv(g2)
#cv3 = gstat.cv(g3)
The gstat.cv function returns an object of class SpatialPointsDataFrame (package sp), which we can convert to an sf object with st_as_sf:
### run the following code from the console -- line-by-line
cv1 = st_as_sf(cv1)
#cv2 = st_as_sf(cv2)
#cv3 = st_as_sf(cv3)
The result of gstat.cv has the following attributes:
A bubble plot is convenient to examine the residuals, since it shows positive and negative values in different color. Here, I plot just a residual for one result. You shoud include three plots in your notebook.
bubble(as(cv1[, "residual"], "Spatial"))
Using the “predicted” and “observed” columns we can calculate prediction accuracy indices, such as the Root Mean Square Error (RMSE):
\[RMSE = \sqrt{\frac{\sum_{i=1}^{n}(pred_i-obs_i)^2}{n}}\]
where \(pred_i\) and \(obs_i\) are predicted and observed values for point \(i\), respectively.
For example:
# This is the RMSE value for the IDW interpolation
sqrt(sum((cv1$var1.pred - cv1$observed)^2) / nrow(cv1))
## [1] 36.21553
In your notebook, examine and compare the three RMSE values (i.e. IDW, OK, and UK). In addition, make a clear statement on the quality of the interpolation you conducted. All this need to be included in your Informe No. 3.
Cite as: Lizarazo, I., 2022. Spatial interpolation of climate data. Available at https://rpubs.com/ials2un/clim_interp.
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] leafem_0.2.0 leaflet_2.1.1 automap_1.0-16 sp_1.5-0 gstat_2.0-9
## [6] stars_0.5-5 abind_1.4-5 sf_1.0-7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 lattice_0.20-45 FNN_1.1.3.1 png_0.1-7
## [5] class_7.3-20 zoo_1.8-10 digest_0.6.29 utf8_1.2.2
## [9] R6_2.5.1 plyr_1.8.7 evaluate_0.15 e1071_1.7-11
## [13] highr_0.9 ggplot2_3.3.6 pillar_1.7.0 rlang_1.0.2
## [17] rstudioapi_0.13 raster_3.5-15 jquerylib_0.1.4 rmarkdown_2.14
## [21] rgdal_1.5-30 stringr_1.4.0 foreign_0.8-82 htmlwidgets_1.5.4
## [25] munsell_0.5.0 proxy_0.4-27 compiler_4.2.0 xfun_0.31
## [29] pkgconfig_2.0.3 base64enc_0.1-3 htmltools_0.5.2 tibble_3.1.7
## [33] codetools_0.2-18 intervals_0.15.2 reshape_0.8.9 fansi_1.0.3
## [37] spacetime_1.2-7 crayon_1.5.1 wk_0.6.0 grid_4.2.0
## [41] jsonlite_1.8.0 lwgeom_0.2-8 gtable_0.3.0 lifecycle_1.0.1
## [45] DBI_1.1.2 magrittr_2.0.3 units_0.8-0 scales_1.2.0
## [49] KernSmooth_2.23-20 cli_3.3.0 stringi_1.7.6 farver_2.1.0
## [53] bslib_0.3.1 ellipsis_0.3.2 xts_0.12.1 vctrs_0.4.1
## [57] s2_1.0.7 tools_4.2.0 glue_1.6.2 crosstalk_1.2.0
## [61] parallel_4.2.0 fastmap_1.1.0 yaml_2.3.5 terra_1.5-21
## [65] colorspace_2.0-3 maptools_1.1-4 classInt_0.4-7 knitr_1.39
## [69] sass_0.4.1