1. Introduction

This notebook illustrates how to conduct spatial interpolation in R using precipitation data as example. This notebook is written to help Geomatica Basica students at Universidad Nacional de Colombia to become familiar with geospatial functionalities provided by the R software environment.

This notebook includes the following sections: * Downloading global CHIRPS data * Pre-processing CHIRPS data * Interpolating CHIRPS data

The interpolation section follows closely the code written by Manuel Gimond. Many thanks!

2. Downloading global CHIRPS data

Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 35+ year quasi-global rainfall data set. Spanning 50°S-50°N (and all longitudes) and ranging from 1981 to near-present, CHIRPS incorporates our in-house climatology, CHPclim, 0.05° resolution satellite imagery, and in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring. You can find more information about CHIRPS here

Let’s start by loading the required libraries:

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-2
library(raster)
library(sf)
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.3     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x tidyr::extract() masks raster::extract()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
## x dplyr::select()  masks raster::select()
library(tmap)
library(gstat)
library(sp)

3. Pre-processing CHIRPS data

Read the uncompressed CHIRPS file representing the cumulative rainfall in the last days of April 2020 (i.e. the precipitation from 26th to 30 th April).

## Read the CHIRPS dataset
(precip <- raster("./chirps/chirps-v2.0.2020.04.6.tif"))
## class      : RasterLayer 
## dimensions : 2000, 7200, 14400000  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : -180, 180, -50, 50  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /Users/ivanlizarazo/Documents/ivan/GB/notebooks/chirps/chirps-v2.0.2020.04.6.tif 
## names      : chirps.v2.0.2020.04.6

This is a dataset with global coverage. Its CRS is a geographic coordinate system.

Let’s load a shapefile representing our area-of-interest:

(colombia <- st_read("./chirps/COL_adm1.shp"))
## Reading layer `COL_adm1' from data source `/Users/ivanlizarazo/Documents/ivan/GB/notebooks/chirps/COL_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 32 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.84153 ymin: -4.228429 xmax: -66.87033 ymax: 15.91247
## CRS:            4326
## Simple feature collection with 32 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.84153 ymin: -4.228429 xmax: -66.87033 ymax: 15.91247
## CRS:            4326
## First 10 features:
##    ID_0 ISO   NAME_0 ID_1    NAME_1       TYPE_1   ENGTYPE_1 NL_NAME_1
## 1    53 COL Colombia    1  Amazonas    Comisaría Commissiary      <NA>
## 2    53 COL Colombia    2 Antioquia Departamento  Department      <NA>
## 3    53 COL Colombia    3    Arauca  Intendencia  Intendancy      <NA>
## 4    53 COL Colombia    4 Atlántico Departamento  Department      <NA>
## 5    53 COL Colombia    5   Bolívar Departamento  Department      <NA>
## 6    53 COL Colombia    6    Boyacá Departamento  Department      <NA>
## 7    53 COL Colombia    7   Córdoba Departamento  Department      <NA>
## 8    53 COL Colombia    8    Caldas Departamento  Department      <NA>
## 9    53 COL Colombia    9   Caquetá  Intendencia  Intendancy      <NA>
## 10   53 COL Colombia   10  Casanare  Intendencia  Intendancy      <NA>
##    VARNAME_1                       geometry
## 1       <NA> MULTIPOLYGON (((-69.43138 -...
## 2       <NA> MULTIPOLYGON (((-76.92486 8...
## 3       <NA> MULTIPOLYGON (((-70.6878 7....
## 4       <NA> MULTIPOLYGON (((-74.84764 1...
## 5       <NA> MULTIPOLYGON (((-75.67486 1...
## 6       <NA> MULTIPOLYGON (((-72.213 7.0...
## 7       <NA> MULTIPOLYGON (((-76.34153 9...
## 8       <NA> MULTIPOLYGON (((-74.695 5.7...
## 9       <NA> MULTIPOLYGON (((-74.6926 2....
## 10      <NA> MULTIPOLYGON (((-71.1686 6....

Let’s select several departments in the Colombian Caribbean:

(aoi <- filter(colombia, NAME_1 == "Sucre" | NAME_1 == "Bolívar" | NAME_1 == "Córdoba" | NAME_1 == "Magdalena"))
## Simple feature collection with 4 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -76.5189 ymin: 6.979202 xmax: -73.6021 ymax: 11.34958
## CRS:            4326
##   ID_0 ISO   NAME_0 ID_1    NAME_1       TYPE_1  ENGTYPE_1 NL_NAME_1
## 1   53 COL Colombia    5   Bolívar Departamento Department      <NA>
## 2   53 COL Colombia    7   Córdoba Departamento Department      <NA>
## 3   53 COL Colombia   19 Magdalena Departamento Department      <NA>
## 4   53 COL Colombia   28     Sucre Departamento Department      <NA>
##      VARNAME_1                       geometry
## 1         <NA> MULTIPOLYGON (((-75.67486 1...
## 2         <NA> MULTIPOLYGON (((-76.34153 9...
## 3 La Magdalena MULTIPOLYGON (((-74.42792 1...
## 4         <NA> MULTIPOLYGON (((-75.70236 9...

Now, let’s clip the precipitation data:

# Crop precipitation data by extent of area-of-interest
precip.crop <- raster::crop(precip, extent(aoi))

Now, mask the raster layer:

precip.mask <- mask(x = precip.crop, mask = aoi)

Check the output:

precip.mask 
## class      : RasterLayer 
## dimensions : 87, 58, 5046  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : -76.5, -73.6, 6.999999, 11.35  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      : chirps.v2.0.2020.04.6 
## values     : 0.004840055, 87.19588  (min, max)

Plot the masked raster layer:

library(leaflet)
library(RColorBrewer)
pal <- colorNumeric(c("red", "orange", "yellow", "blue", "darkblue"), values(precip.mask),
  na.color = "transparent")

leaflet() %>% addTiles() %>%
  addRasterImage(precip.mask, colors = pal, opacity = 0.6) %>%
  addLegend(pal = pal, values = values(precip.mask),
    title = "CHIRPS rainfall in the Caribbean from 26.04 to 30.04 in 2020 [mm]")

Conversion of raster into points:

(precip.points <- rasterToPoints(precip.mask, spatial = TRUE))
## class       : SpatialPointsDataFrame 
## features    : 2729 
## extent      : -76.475, -73.625, 7.024999, 11.325  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 1
## names       : chirps.v2.0.2020.04.6 
## min values  :   0.00484005501493812 
## max values  :      87.1958770751953
names(precip.points) <- "rain"
precip.points
## class       : SpatialPointsDataFrame 
## features    : 2729 
## extent      : -76.475, -73.625, 7.024999, 11.325  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 1
## names       :                rain 
## min values  : 0.00484005501493812 
## max values  :    87.1958770751953
str(precip.points)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  2729 obs. of  1 variable:
##   .. ..$ rain: num [1:2729] 0.04358 0.0903 0.14898 0.15437 0.00484 ...
##   ..@ coords.nrs : num(0) 
##   ..@ coords     : num [1:2729, 1:2] -74.1 -74.1 -74 -74 -74.2 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "x" "y"
##   ..@ bbox       : num [1:2, 1:2] -76.47 7.02 -73.62 11.32
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Let’s plot the points:

plot(precip.mask, main= "CHIRPS rainfall in the Caribbean from 26 to 30.04.2020 [mm]")
plot(aoi, add=TRUE)
points(precip.points$x, precip.points$y, col = "white", cex = .05)

aoi$area =  st_area(aoi)
aoi
## Simple feature collection with 4 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -76.5189 ymin: 6.979202 xmax: -73.6021 ymax: 11.34958
## CRS:            4326
##   ID_0 ISO   NAME_0 ID_1    NAME_1       TYPE_1  ENGTYPE_1 NL_NAME_1
## 1   53 COL Colombia    5   Bolívar Departamento Department      <NA>
## 2   53 COL Colombia    7   Córdoba Departamento Department      <NA>
## 3   53 COL Colombia   19 Magdalena Departamento Department      <NA>
## 4   53 COL Colombia   28     Sucre Departamento Department      <NA>
##      VARNAME_1                       geometry              area
## 1         <NA> MULTIPOLYGON (((-75.67486 1... 26034571858 [m^2]
## 2         <NA> MULTIPOLYGON (((-76.34153 9... 24920893870 [m^2]
## 3 La Magdalena MULTIPOLYGON (((-74.42792 1... 21546629848 [m^2]
## 4         <NA> MULTIPOLYGON (((-75.70236 9... 10807536936 [m^2]

Now, dissolve the internal boundaries:

(border_sf <-
  aoi %>%
  summarise(area = sum(area)/1000000))
## Simple feature collection with 1 feature and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -76.5189 ymin: 6.979202 xmax: -73.6021 ymax: 11.34958
## CRS:            4326
##             area                       geometry
## 1 83309.63 [m^2] MULTIPOLYGON (((-76.34153 9...

Convert the spatial feature into a spatial dataframe:

 (border <- as(border_sf, 'Spatial')) 
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : -76.5189, -73.6021, 6.979202, 11.34958  (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 1
## names       :             area 
## value       : 83309.6325121808

Two reprojection tasks:

p.sf.magna <- st_transform(st_as_sf(precip.points), crs=3116)
caribe.sf.magna <- st_transform(aoi, crs=3116)

An object class conversion:

precip2 <- as(p.sf.magna, 'Spatial')

Rounding precipitation values:

precip2$rainfall <- round(precip2$rain, 1)
##precip2$rainfall <- round(precip2$chirps.v2.0.2020.03.6, 1)

The output:

precip2
## class       : SpatialPointsDataFrame 
## features    : 2729 
## extent      : 735351.2, 1049499, 1268593, 1744186  (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs         : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 2
## names       :                rain, rainfall 
## min values  : 0.00484005501493812,        0 
## max values  :    87.1958770751953,     87.2

Change object class:

(caribe2 <- as(caribe.sf.magna, 'Spatial'))
## class       : SpatialPolygonsDataFrame 
## features    : 4 
## extent      : 730554.4, 1052004, 1263533, 1746903  (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs         : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 10
## names       : ID_0, ISO,   NAME_0, ID_1,  NAME_1,       TYPE_1,  ENGTYPE_1, NL_NAME_1,    VARNAME_1,            area 
## min values  :   53, COL, Colombia,    5, Bolívar, Departamento, Department,        NA, La Magdalena, 10807536936.193 
## max values  :   53, COL, Colombia,   28,   Sucre, Departamento, Department,        NA, La Magdalena,  26034571857.66

Make sure the two extents match:

# Replace point boundary extent with that of Montes
precip2@bbox <- caribe2@bbox

Splitting data into training and validation datasets

train_index <- sample(1:nrow(precip2), 0.7 * nrow(precip2))
test_index <- setdiff(1:nrow(precip2), train_index)
ptos_train <- precip2[train_index, ]
ptos_test  <- precip2[test_index,]
ptrain <- spTransform(ptos_train, crs(precip.mask))
ptest <- spTransform(ptos_test, crs(precip.mask))

Plotting the data

#install.packages("htmltools")
library(htmltools)
library(leaflet.extras)
# first prepare a leaflet plot ...
lplot <- leaflet(data = precip2) %>% # data = original body - to get the zoom right
  addProviderTiles("CartoDB.Positron") %>% 
  addRasterImage(precip.mask, colors = pal, opacity = 0.6) %>%
  addCircleMarkers(data = ptrain, # first group
                   radius = 1,
                   fillOpacity = .7,
                   stroke = FALSE,
                   popup = ~htmlEscape(rainfall),
                   color = pal(ptos_train$rainfall), # using already created palette
                   clusterOptions = markerClusterOptions(),
                   group = "Training") %>% 
  addCircleMarkers(data = ptest, # second group
                   radius = 10,
                   fillOpacity = .7,
                   stroke = FALSE,
                   popup = ~htmlEscape(rainfall),
                   color = pal(ptos_test$rainfall), # using already created palette
                   clusterOptions = markerClusterOptions(),
                   group = "Test") %>% 
  addLegend(position = "bottomright",
            values = ~rainfall,
            opacity = .7,
            pal = pal, # palette declared previously
            title = "Precipitation") %>% 
  leaflet::addLayersControl(overlayGroups = c("Training", "Test"),
                   options = layersControlOptions(collapsed = FALSE)) %>% 
  addResetMapButton()
lplot #  ... display the map

4. Interpolating CHIRPS data

4.1 Thiessen polygons

The Thiessen polygons (or proximity interpolation) can be created using spatstat’s dirichlet function.

## Loading required package: spatstat.data
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## The following object is masked from 'package:raster':
## 
##     getData
## Loading required package: rpart
## Registered S3 method overwritten by 'spatstat':
##   method     from
##   print.boxx cli
## 
## spatstat 1.63-3       (nickname: 'Wet paint') 
## For an introduction to spatstat, type 'beginner'
## 
## Note: spatstat version 1.63-3 is out of date by more than 3 months; we recommend upgrading to the latest version.
## 
## Attaching package: 'spatstat'
## The following object is masked from 'package:gstat':
## 
##     idw
## The following objects are masked from 'package:raster':
## 
##     area, rotate, shift
## Checking rgeos availability: TRUE
# Create a tessellated surface
th  <-  as(dirichlet(as.ppp(ptos_train)), "SpatialPolygons")

# The dirichlet function does not carry over projection information
# requiring that this information be added manually
crs(th) <- crs(ptos_train)
crs(caribe2) <- crs(ptos_train)
crs(th) 
## CRS arguments:
##  +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1
## +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs
crs(ptos_train)
## CRS arguments:
##  +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1
## +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs

In case of getting lost at any step, please review the corresponding sp and sf functionalities.

# The tessellated surface does not store attribute information
# from the point data layer. We'll use the over() function (from the sp
# package) to join the point attributes to the tesselated surface via
# a spatial join. The over() function creates a dataframe that will need to
# be added to the `th` object thus creating a SpatialPolygonsDataFrame object
th.z     <- over(th, ptos_train, fn=mean)
th.spdf  <-  SpatialPolygonsDataFrame(th, th.z)

# Finally, we'll clip the tessellated  surface to the AOI boundaries
th.clp   <- raster::intersect(caribe2,th.spdf)
# Map the data
tm_shape(th.clp) + 
  tm_polygons(col="rainfall", palette="RdBu", midpoint=30.0,
              title="Thiessen Polygons \nPredicted precipitation \n(in mm)") +
  tm_legend(legend.outside=TRUE)

4.2 Inverse Distance Weighted Interpolation

Many packages share the same function names. This can be a problem when these packages are loaded in a same R session. For example, the intersect function is available in the base, spatstat and raster packages–all of which are loaded in this current session. To ensure that the proper function is selected, it’s a good idea to preface the function name with the package name as in raster::intersect().

This tip will be used in the next chunk of code when calling the idw function which is available in both spatstat and gstat.

Note that the dirichlet function (like most functions in the spatsat package) require that the point object be in a ppp format hence the inline as.ppp(precip2) syntax.

The IDW output is a raster. This requires that we first create an empty raster grid, then interpolate the precipitation values to each unsampled grid cell. An IDW power value of 2 (idp=2.0) will be used.

# Create an empty grid where n is the total number of cells
grd              <- as.data.frame(spsample(ptos_train, "regular", n=500000))
# You need to figure out what is the expected size of the output grd
names(grd)       <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd)     <- TRUE  # Create SpatialPixel object
fullgrid(grd)    <- TRUE  # Create SpatialGrid object

# Add P's projection information to the empty grid
proj4string(grd) <- proj4string(ptos_train)

# Interpolate the grid cells using a power value of 2 (idp=2.0)
P.idw <- gstat::idw(rainfall ~ 1, ptos_train, newdata=grd, idp=2.0)

# Convert to raster object then clip to AOI
r       <- raster(P.idw)
r
## class      : RasterLayer 
## dimensions : 870, 574, 499380  (nrow, ncol, ncell)
## resolution : 546.6352, 546.6352  (x, y)
## extent     : 735491.9, 1049261, 1268572, 1744145  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : memory
## names      : var1.pred 
## values     : 0.02547076, 87.04677  (min, max)
caribe2
## class       : SpatialPolygonsDataFrame 
## features    : 4 
## extent      : 730554.4, 1052004, 1263533, 1746903  (xmin, xmax, ymin, ymax)
## crs         : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 10
## names       : ID_0, ISO,   NAME_0, ID_1,  NAME_1,       TYPE_1,  ENGTYPE_1, NL_NAME_1,    VARNAME_1,            area 
## min values  :   53, COL, Colombia,    5, Bolívar, Departamento, Department,        NA, La Magdalena, 10807536936.193 
## max values  :   53, COL, Colombia,   28,   Sucre, Departamento, Department,        NA, La Magdalena,  26034571857.66
r.m   <- raster::mask(r, caribe2)
r.m 
## class      : RasterLayer 
## dimensions : 870, 574, 499380  (nrow, ncol, ncell)
## resolution : 546.6352, 546.6352  (x, y)
## extent     : 735491.9, 1049261, 1268572, 1744145  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : memory
## names      : var1.pred 
## values     : 0.02547076, 87.04677  (min, max)
# Plot
tm_shape(r.m) + 
  tm_raster(n=10,palette = "RdBu", auto.palette.mapping = FALSE,
            title="Inverse Distance Weighted \nPredicted precipitation \n(in mm)") + 
  tm_shape(ptos_train) + tm_dots(size=0.02, col ="white") +
  tm_legend(legend.outside=TRUE)

A better visualization of the interpolated surface:

library(leaflet)
library(RColorBrewer)
pal <- colorNumeric(c("red", "orange", "yellow", "blue", "darkblue"), values(precip.mask),
  na.color = "transparent")

leaflet() %>% addTiles() %>%
  addRasterImage(r.m, colors = pal, opacity = 0.6) %>%
  addLegend(pal = pal, values = values(r.m),
    title = "IDW interpolated rainfall in the Caribbean from 26.04 to 30.04 in 2020 [mm]")

Fine-tuning the interpolation

The choice of power function can be subjective. To fine-tune the choice of the power parameter, you can perform a leave-one-out validation routine to measure the error in the interpolated values.

P <- ptos_train
IDW.out <- vector(length = length(P))
for (i in 1:length(P)) {
  IDW.out[i] <- gstat::idw(rainfall ~ 1, P[-i,], P[i,], idp=2.0)$var1.pred
}
# Plot the differences
OP <- par(pty="s", mar=c(4,3,0,0))
  plot(IDW.out ~ P$rainfall, asp=1, xlab="Observed", ylab="Predicted", pch=16,
       col=rgb(0,0,0,0.5))
  abline(lm(IDW.out ~ P$rainfall), col="red", lw=2,lty=2)
  abline(0,1)

par(OP)

The root mean square error (RMSE) can be computed from IDW.out as follows:

# Compute RMSE
sqrt( sum((IDW.out - P$rainfall)^2) / length(P))
## [1] 4.74758

Cross-validation

In addition to generating an interpolated surface, you can create a 95% confidence interval map of the interpolation model. Here we’ll create a 95% CI map from an IDW interpolation that uses a power parameter of 2 (idp=2.0).

As a cross-validation with all points can be lenghty process, we will illustrate it on a low percentage of the points (e.g. 10%). It will make the process faster:

index = sample(1:nrow(ptos_train), 0.1 * nrow(ptos_train))
(P <- ptos_train[index, ])
## class       : SpatialPointsDataFrame 
## features    : 191 
## extent      : 740814.1, 1038585, 1290710, 1744186  (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs         : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 2
## names       :              rain, rainfall 
## min values  : 0.154373064637184,      0.2 
## max values  :  87.1958770751953,     87.2

Conduct the jacknife technique:

# Implementation of a jackknife technique to estimate 
# a confidence interval at each unsampled point.
# Create the interpolated surface
img <- gstat::idw(rainfall~1, P, newdata=grd, idp=2.0)
n   <- length(P)
Zi  <- matrix(nrow = length(img$var1.pred), ncol = n)

# Remove a point then interpolate (do this n times for each point)
st <- stack()
for (i in 1:n){
  Z1 <- gstat::idw(rainfall~1, P[-i,], newdata=grd, idp=2.0)
  st <- addLayer(st,raster(Z1,layer=1))
  # Calculated pseudo-value Z at j
  Zi[,i] <- n * img$var1.pred - (n-1) * Z1$var1.pred
}

# Jackknife estimator of parameter Z at location j
Zj <- as.matrix(apply(Zi, 1, sum, na.rm=T) / n )

# Compute (Zi* - Zj)^2
c1 <- apply(Zi,2,'-',Zj)            # Compute the difference
c1 <- apply(c1^2, 1, sum, na.rm=T ) # Sum the square of the difference

# Compute the confidence interval
CI <- sqrt( 1/(n*(n-1)) * c1)

# Create (CI / interpolated value) raster
img.sig   <- img
img.sig$v <- CI /img$var1.pred 
# Clip the confidence raster to AOI
r <- raster(img.sig, layer="v")
r.m <- raster::mask(r, caribe2)

# Plot the map
tm_shape(r.m) + tm_raster(n=7,title="IDW\n95% confidence interval \n(in mm)") +
  tm_shape(P) + tm_dots(size=0.02) +
  
  tm_legend(legend.outside=TRUE)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

4.3 1st order polynomial fit

To fit a first order polynomial model of the form \(precip = intercept + aX +bY\)

# Again define P
P <- ptos_train

# Define the 1st order polynomial equation
f.1 <- as.formula(rainfall ~ X + Y) 
 
# Add X and Y to P
P$X <- coordinates(P)[,1]
P$Y <- coordinates(P)[,2]

# Run the regression model
lm.1 <- lm( f.1, data=P)

# Use the regression model output to interpolate the surface
dat.1st <- SpatialGridDataFrame(grd, data.frame(var1.pred = predict(lm.1, newdata=grd))) 
# Clip the interpolated raster to the Caribbean
r   <- raster(dat.1st)
r.m <- raster::mask(r, caribe2)

# Plot the map
tm_shape(r.m) + 
  tm_raster(n=10, palette="RdBu", auto.palette.mapping=FALSE, midpoint = NA,
            title="1st order polinomial fir \nPredicted precipitation \n(in mm)") +
  tm_shape(P) + tm_dots(size=0.02, col="white") +
  tm_legend(legend.outside=TRUE)

4.4 2nd order polynomial fit

To fit a second order polynomial model of the form \(precip = intercept + aX +bY +dX^2 +eY^2 +fXY\)

# Define the 2nd order polynomial equation
f.2 <- as.formula(rainfall ~ X + Y + I(X*X)+I(Y*Y) + I(X*Y))

# Add X and Y to P
P$X <- coordinates(P)[,1]
P$Y <- coordinates(P)[,2]
 
# Run the regression model
lm.2 <- lm( f.2, data=P)

# Use the regression model output to interpolate the surface
dat.2nd <- SpatialGridDataFrame(grd, data.frame(var1.pred = predict(lm.2, newdata=grd))) 
# Clip the interpolated raster to Texas
r   <- raster(dat.2nd)
r.m <- raster::mask(r, caribe2)

# Plot the map
tm_shape(r.m) + 
  tm_raster(n=10, palette="RdBu", auto.palette.mapping=FALSE,midpoint = NA,
            title="2nd order polinomial fit\nPredicted precipitation \n(in mm)") +
  tm_shape(P) + tm_dots(size=0.02, col="white") +
  tm_legend(legend.outside=TRUE)

4.5 Kriging

Fit the variogram model

First, we need to create a variogram model. Note that the variogram model is computed on the de-trended data. This is implemented in the following chunk of code by passing the 1st order trend model (defined in an earlier code chunk as formula object f.1) to the variogram function.

# Define the 1st order polynomial equation
f.1 <- as.formula(rainfall ~ X + Y) 

# Compute the sample variogram; note that the f.1 trend model is one of the
# parameters passed to variogram(). This tells the function to create the 
# variogram on the de-trended data.
var.smpl <- variogram(f.1, P, cloud = FALSE, cutoff=200000, width=10000)

# Compute the variogram model by passing the nugget, sill and range values
# to fit.variogram() via the vgm() function.
dat.fit  <- fit.variogram(var.smpl, fit.ranges = TRUE, fit.sills = TRUE,
                          vgm(psill=5, model="Gau", range=120000, nugget=-0.2))

## model can be Exp, Gau, Sph, Mat
## See gstat package - vgm function - to get the names of these models
## Review kriging theory to understand their meaning
dat.fit
##   model      psill    range
## 1   Nug   7.850546     0.00
## 2   Gau 192.614371 86720.79
# The following plot allows us to assess the fit
plot(var.smpl, dat.fit, xlim=c(0,150000))

Generate Kriged surface

Next, use the variogram model dat.fit to generate a kriged interpolated surface. The krige function allows us to include the trend model thus saving us from having to de-trend the data, krige the residuals, then combine the two rasters. Instead, all we need to do is pass krige the trend formula f.1.

# Define the trend model
f.1 <- as.formula(rainfall ~ X + Y) 

# Perform the krige interpolation (note the use of the variogram model
# created in the earlier step)
dat.krg <- krige( f.1, P, grd, dat.fit)
## [using universal kriging]
# Convert kriged surface to a raster object for clipping
r <- raster(dat.krg)
r.m <- raster::mask(r, caribe2)
r.m
## class      : RasterLayer 
## dimensions : 870, 574, 499380  (nrow, ncol, ncell)
## resolution : 546.6352, 546.6352  (x, y)
## extent     : 735491.9, 1049261, 1268572, 1744145  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : /private/var/folders/l_/1nv5l0dx78g_d3k3_t85lrjr0000gn/T/RtmpOjOeVe/raster/r_tmp_2020-06-10_111306_32907_58559.grd 
## names      : var1.pred 
## values     : -6.146191, 78.79895  (min, max)
# Plot the map
tm_shape(r.m) + 
  tm_raster(n=10, palette="RdBu", auto.palette.mapping=FALSE, midpoint = NA,
            title="Universal Kriging\nPredicted precipitation \n(in mm)") +
  tm_shape(P) + tm_dots(size=0.02, col="white") +
  tm_legend(legend.outside=TRUE)

A better visualization of the interpolated surface:

library(leaflet)
library(RColorBrewer)
pal <- colorNumeric(c("red", "orange", "yellow", "blue", "darkblue"), values(precip.mask),
  na.color = "transparent")

leaflet() %>% addTiles() %>%
  addRasterImage(r.m, colors = pal, opacity = 0.6) %>%
  addLegend(pal = pal, values = values(r.m),
    title = "Kriging interpolated rainfall in the Caribbean from 26.04 to 30.04.2020 [mm]")

Generate the variance and confidence interval maps

The dat.krg object stores not just the interpolated values, but the variance values as well. These can be passed to the raster object for mapping as follows:

r   <- raster(dat.krg, layer="var1.var")
r.m <- raster::mask(r, caribe2)

tm_shape(r.m) + 
  tm_raster(n=7, palette ="Reds",
            title="Kriging Interpolation\nVariance map \n(in squared mm)") +tm_shape(P) + tm_dots(size=0.02) +
  tm_legend(legend.outside=TRUE)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

A more easily interpretable map is the 95% confidence interval map which can be generated from the variance object as follows (the map values should be interpreted as the number of mm above and below the estimated rainfall amount).

r   <- sqrt(raster(dat.krg, layer="var1.var")) * 1.96
r.m <- raster::mask(r, caribe2)

tm_shape(r.m) + 
  tm_raster(n=7, palette ="Reds",
            title="Kriging Interpolation\n95% CI map \n(in mm)") +tm_shape(P) + tm_dots(size=0.02, col="white") +
  tm_legend(legend.outside=TRUE)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

I did not interpret the output of each interpolation method. I did not compare the different outputs. These are very important tasks for your report.

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] maptools_0.9-9       spatstat_1.63-3      rpart_4.1-15        
##  [4] nlme_3.1-147         spatstat.data_1.4-3  leaflet.extras_1.0.0
##  [7] htmltools_0.4.0      RColorBrewer_1.1-2   leaflet_2.0.3       
## [10] gstat_2.0-6          tmap_3.0             forcats_0.5.0       
## [13] stringr_1.4.0        dplyr_0.8.5          purrr_0.3.4         
## [16] readr_1.3.1          tidyr_1.0.3          tibble_3.0.1        
## [19] ggplot2_3.3.0        tidyverse_1.3.0      sf_0.9-3            
## [22] raster_3.1-5         rgdal_1.4-8          sp_1.4-2            
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.4.1                xts_0.12-0              lubridate_1.7.8        
##  [4] httr_1.4.1              tools_3.6.3             backports_1.1.7        
##  [7] R6_2.4.1                KernSmooth_2.23-17      rgeos_0.5-3            
## [10] mgcv_1.8-31             DBI_1.1.0               colorspace_1.4-1       
## [13] withr_2.2.0             tidyselect_1.1.0        compiler_3.6.3         
## [16] leafem_0.1.1            cli_2.0.2               rvest_0.3.5            
## [19] xml2_1.3.2              scales_1.1.1            classInt_0.4-3         
## [22] goftest_1.2-2           digest_0.6.25           foreign_0.8-75         
## [25] spatstat.utils_1.17-0   rmarkdown_2.1           base64enc_0.1-3        
## [28] dichromat_2.0-0         pkgconfig_2.0.3         dbplyr_1.4.3           
## [31] htmlwidgets_1.5.1       rlang_0.4.6             readxl_1.3.1           
## [34] rstudioapi_0.11         FNN_1.1.3               farver_2.0.3           
## [37] generics_0.0.2          zoo_1.8-8               jsonlite_1.6.1         
## [40] crosstalk_1.1.0.1       magrittr_1.5            Matrix_1.2-18          
## [43] Rcpp_1.0.4.6            munsell_0.5.0           fansi_0.4.1            
## [46] abind_1.4-5             lifecycle_0.2.0         stringi_1.4.6          
## [49] leafsync_0.1.0          yaml_2.2.1              tmaptools_3.0          
## [52] grid_3.6.3              parallel_3.6.3          crayon_1.3.4           
## [55] deldir_0.1-25           lattice_0.20-41         splines_3.6.3          
## [58] stars_0.4-2             haven_2.2.0             tensor_1.5             
## [61] hms_0.5.3               knitr_1.28              pillar_1.4.4           
## [64] spacetime_1.2-3         codetools_0.2-16        reprex_0.3.0           
## [67] XML_3.99-0.3            glue_1.4.1              evaluate_0.14          
## [70] leaflet.providers_1.9.0 modelr_0.1.6            vctrs_0.3.0            
## [73] png_0.1-7               cellranger_1.1.0        polyclip_1.10-0        
## [76] gtable_0.3.0            assertthat_0.2.1        xfun_0.14              
## [79] lwgeom_0.2-3            broom_0.5.6             e1071_1.7-3            
## [82] class_7.3-17            viridisLite_0.3.0       intervals_0.15.2       
## [85] units_0.6-6             ellipsis_0.3.1