1 Overview

Spatial interpolation has been widely and commonly used in many studies to create surface data based on a set of sampled points, such as soil properties, temperature, and precipitation. Simple approach like inverse distance weighted algorithm or the well known kriging procedures have been routinely been applied in many studies. In this hands-on exercise, you will gain hands-on experience on using appropriate R functions to perform spatial interpolation by using these two popular methods.

1.1 The research questions

The specific questions we would like to answer are as follows:

  • given a distribution of point meteorological stations showing rainfall values, how I can I estimate the rainfall values where data were not observed?

  • after deriving the estimate values, can I determine the confident intervals of these values?

1.2 The data

To provide answers to the questions above, two data sets will be used. They are:

  • CostalOutline, a polygon feature data showing the national boundary of Singapore. It is provided by SLA and is in ESRI shapefile format.
  • Monthly_rainfall_202009, a csv file provides monthly total rainfall records by rainfall station. There are four fields in this data file, namely: Station, Latitude, Longitude, and TMR. The Latitude and Longitude are given in decimal degree and is in WGS84 geographic projection system. The TMR field provides Total Monthly Rainfall in millimeter (mm).

2 Installing and Loading the R packages

In this hands-on exercise, three new R packages will be used, they are:

  • gstat, a programme for spatial and spatio-temporal geostatistical modelling, prediction and simulation.

  • automap, a programme for performing an automatic interpolation by automatically estimating the variogram and then calling gstat.

  • raster which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster).

  • sf, a package that specially design for handling, processing and wrangling simple features in R.

  • sp, Classes and methods for spatial data.

  • tmap, a package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and proportional symbol maps.

  • tidyverse provide a family of R package for perfroming data science tasks.

The code chunk below will be used to check if the above packages have been installed in R. If they have yet to be installed, they will be installed. After insatlling all these packages, the will be launched in RStudio by using library().

packages = c('sf', 'sp', 'raster', 'gstat', 
             'automap','tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}

3 Geospatial Data Wrangling

3.1 Importing the spatial data

The CoastalOutline geospatial data is in ESRI shapefile format. The code chunk below uses st_read() of st package to import the geospatial data in R and save it as simple feature data.frame.

sg <- st_read(dsn = "data/geospatial",
              layer="CostalOutline", 
              crs = 3414)
## Reading layer `CostalOutline' from data source `D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex12\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 60 features and 4 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
## projected CRS:  SVY21 / Singapore TM

Note that crs argument of st_read() to specify the geospatial data is in svy21 projected coordinate system.

You can double confirm the projection by using crs() as shown below.

crs(sg)
## CRS arguments:
##  +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0
## +units=m +no_defs

3.1.1 Checking the structure of the geospatial data

Next, the code chunk below will be used to display the data structure of the simple feature data.frame.

glimpse(sg)
## Rows: 60
## Columns: 5
## $ GDO_GID    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ MSLINK     <dbl> 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,...
## $ MAPID      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ COSTAL_NAM <chr> "Linkway", "SENTOSA", "PULAU SARIMBUN", "PULAU SAMULUN",...
## $ geometry   <POLYGON [m]> POLYGON ((14362.86 32307.49..., POLYGON ((25683....

3.1.2 Visualising the imported

We can also display the geospatial data as a map by using tmap() package.

tm_shape(sg)+
  tm_polygons()

4 Aspatial Data Wrangling

In this section, we will import the an aspatial data called Monthly_rainfall_202009.csv in R and then convert it into a simple point feature data.frame.

4.1 Data Import and Preparation

The code chunk below uses read_csv() function of readr package to import Monthly_rainfall_202009.csv into R as a tibble data frame called rainfall.

rainfall <- read_csv("data/aspatial/Monthly_rainfall_202009.csv")

After importing the data file into R, it is important for us to examine if the data file has been imported correctly.

The code chunk below shows list() is used to do the job.

list(rainfall) 
## [[1]]
## # A tibble: 63 x 4
##    Station          Latitude Longitude   TMR
##    <chr>               <dbl>     <dbl> <dbl>
##  1 Admiralty            1.44      104.  253.
##  2 Admiralty (West)     1.46      104.   NA 
##  3 Ang Mo Kio           1.38      104.  235.
##  4 Boon Lay (East)      1.33      104.   NA 
##  5 Boon Lay (West)      1.33      104.   NA 
##  6 Botanic Garden       1.31      104.  338.
##  7 Buangkok             1.38      104.   NA 
##  8 Bukit Panjang        1.38      104.  351 
##  9 Bukit Timah          1.32      104.  299.
## 10 Buona Vista          1.28      104.  409.
## # ... with 53 more rows

Notice that the rainfall data in tibble data frame and not the common R data frame.

4.2 Creating a sf data frame from an aspatial data frame

The code chunk below converts rainfall data frame into a simple feature data frame by using st_as_sf() of sf packages

rainfall_sf <- st_as_sf(rainfall, 
                       coords = c("Longitude", "Latitude"),
                       crs= 4326) %>%
  st_transform(crs = 3414)

Things to learn from the arguments above:

  • The coords argument requires you to provide the column name of the x-coordinates first then followed by the column name of the y-coordinates.
  • The crs argument required you to provide the coordinates system in epsg format. EPSG: 3414 is Singapore SVY21 Projected Coordinate System. You can search for other country’s epsg code by refering to epsg.io.

Figure below shows the data table of rainfall_sf. Notice that a new column called geometry has been added into the data frame.

You can display the basic information of the newly created rainfall_sf by using the code chunk below.

list(rainfall_sf)
## [[1]]
## Simple feature collection with 63 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 4081.375 ymin: 19099.2 xmax: 44613.25 ymax: 48865.9
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 63 x 3
##    Station            TMR            geometry
##  * <chr>            <dbl>         <POINT [m]>
##  1 Admiralty         253.  (22667.41 47284.7)
##  2 Admiralty (West)   NA   (23769.15 48865.9)
##  3 Ang Mo Kio        235. (29767.41 39820.84)
##  4 Boon Lay (East)    NA  (15444.45 34712.56)
##  5 Boon Lay (West)    NA   (13630.41 34414.1)
##  6 Botanic Garden    338. (26295.19 32334.92)
##  7 Buangkok           NA   (33862.77 40628.1)
##  8 Bukit Panjang     351  (19873.96 40484.41)
##  9 Bukit Timah       299.  (26417.61 33484.9)
## 10 Buona Vista       409. (23023.19 29614.82)
## # ... with 53 more rows

The output shows that rainfall_sf is in point feature class. It’s epsg ID is 3414. The bbox provides information of the extend of the geospatial data.

We can also prepare a point value map by using the code chunk below.

tm_shape(sg) + 
  tm_polygons() +
tm_shape(rainfall_sf) +
  tm_dots(col="TMR",
          palette = "Blues", auto.palette.mapping = FALSE,
          title="Sampled rainfall \n(in mm)", size=0.7) +
  tm_text("TMR", just="left", xmod=.5, size = 0.7) +
tm_legend(legend.outside=TRUE)

5 Deterministic Approach

5.1 Inverse Distance Weighted (IDW)

The IDW output is a raster. This requires that we first create an empty raster grid, then interpolate the precipitation values to each unsampled grid cell. An IDW power value of 2 (idp=2.0) will be used.

# Create an empty grid where n is the total number of cells
sg_sp <- as_Spatial(sg)
rainfall_sp <- rainfall_sf %>%
  filter(TMR > 0) %>%
  as_Spatial()

rainfall_sp@bbox <- sg_sp@bbox
grd <- as.data.frame(spsample(rainfall_sp, "regular", n=50000))
names(grd) <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd) <- TRUE # Create SpatialPixel object
fullgrid(grd) <- TRUE # Create SpatialGrid object
# Add P's projection information to the empty grid
proj4string(rainfall_sp) <- proj4string(rainfall_sp) # Temp fix until new proj env is adopted
proj4string(grd) <- proj4string(rainfall_sp)
plot(grd)

The code chunk below uses idw() of gstat package to derive the rainfall surface layer.

P.idw <- gstat::idw(TMR ~ 1, rainfall_sp, newdata=grd, idp=2.0)
## [inverse distance weighted interpolation]

Notice that the power factor used is 2.

5.1.1 Converting to raster object then clip to SG

Before we can plot the output as a surface map using tmap package we need convert the output grid layer into R’s raster format. We will also clip the output raster using Singapore coastal outline in order to ensure that the final plot will only confine within Singapore coastal outline.

r <- raster(P.idw)
r.m <- mask(r, sg_sp)

5.1.2 Visualising the IDW surface map

Next, we will plot the interpolated grid data by using the code chunk below.

tm_shape(r.m) +
  tm_raster(n=10,
            palette = "Blues",
            midpoint = TRUE,
            title="Predicted precipitation \n(in inches)") +
tm_shape(rainfall_sp) + 
  tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)

5.1.3 Confident interval map

In addition to generating an interpolated surface, you can create a 95% confidence interval map of the interpolation model. Here we’ll create a 95% CI map from an IDW interpolation that uses a power parameter of 2 (idp=2.0).

img <- gstat::idw(TMR~1, 
                  rainfall_sp, 
                  newdata=grd, 
                  idp=2.0)
## [inverse distance weighted interpolation]
n <- length(rainfall_sp)
Zi <- matrix(nrow = length(img$var1.pred), 
             ncol = n)

Next, we will remove a point then interpolate (do this n times for each point) by using the code chunk below.

st <- stack()
for (i in 1:n){
  Z1 <- gstat::idw(TMR~1, 
                   rainfall_sp[-i,], 
                   newdata=grd, 
                   idp=2.0)
st <- addLayer(st,raster(Z1,layer=1))

# Calculated pseudo-value Z at j
Zi[,i] <- n * img$var1.pred - (n-1) * Z1$var1.pred
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]

This is follow by calculating Jackknife estimator of parameter Z at location j by using the code chunk below.

Zj <- as.matrix(apply(Zi, 1, sum, na.rm=T) / n )
# Compute (Zi* - Zj)^2
c1 <- apply(Zi,2,'-',Zj) # Compute the difference
c1 <- apply(c1^2, 1, sum, na.rm=T ) # Sum the square of the difference

Next, we will compute the confidence interval by using the code chunk below.

CI <- sqrt( 1/(n*(n-1)) * c1)

In order to display the CI output on a map, we will create (CI / interpolated value) raster by using the code chunk below.

img.sig <- img
img.sig$v <- CI /img$var1.pred

We will then use the code chunk below to clip the output raster within the coastal outline of Singapore.

r <- raster(img.sig, layer="v")
r.m <- mask(r, sg_sp)

Now, we can plot the CI map by using the code chunk below.

tm_shape(r.m) + 
  tm_raster(n=8,
            title="95% confidence interval \n(in mm)") +
tm_shape(rainfall_sp) + 
  tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)

6 Geostatistical Method

6.1 Fit the variogram model

First, we need to create a variogram model. Note that the variogram model is computed on the de-trended data. This is implemented in the following chunk of code by passing the 1 order trend model (defined in an earlier code chunk as formula object f.1 ) to the variogram function.

6.1.1 Define the variogram equation

To begin, we will use formula() to define the variogram equation.

f.1 <- as.formula(TMR ~ 1)
rainfall_sp$X <- coordinates(rainfall_sp)[,1]
rainfall_sp$Y <- coordinates(rainfall_sp)[,2]

Notice that the code chunk above also add the x-coordinates and y-coordinates to rainfall_sp SpatialPointDataFrame.

6.1.2 Fitting the variogram

There are several ways to fit a variogram model to an empirical variogram. We will use the simplest one—automatic fitting using function autofitVariogram() from package automap.

rfVariogram <- autofitVariogram(f.1, rainfall_sp)

The function chooses the best fitting type of model, and also fine tunes its parameters. (Use show.vgms() to display variogram model types.) Note that the autofitVariogram function does not work on sf objects, which is why we convert the object to a SpatialPointsDataFrame (package sp).

The fitted model can be plotted by using the code chunk below.

plot(rfVariogram)

You can also using the model argument of autofitVariogram to fit different models as shown in the code chunk below.

rf_sph <- autofitVariogram(f.1, 
                           rainfall_sp,
                           model = c("Sph"))
rf_exp <- autofitVariogram(f.1, 
                           rainfall_sp,
                           model = c("Exp"))
rf_gau <- autofitVariogram(f.1, 
                           rainfall_sp,
                           model = c("Gau"))
rf_ste <- autofitVariogram(f.1, 
                           rainfall_sp,
                           model = c("Ste"))

Then, plot them out for comparison.

par(mfrow=c(2,2))
plot(rf_sph)

plot(rf_exp)

plot(rf_gau)

plot(rf_ste)

6.2 Estimating the values

The resulting object of autofitVariogram() is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:

rfVariogram$var_model
##   model     psill    range kappa
## 1   Nug  1899.784     0.00     0
## 2   Ste 23042.918 39877.66    10

The variogram model can then be passed to the gstat(), and we can carry on with the Ordinary Kriging interpolation:

rf.krg <- krige(f.1, 
                rainfall_sp, 
                grd, 
                rfVariogram$var_model)
## [using ordinary kriging]

We can plot the output by using the code chunk below.

plot(rf.krg)

6.2.1 Converting kriged surface to a raster object for clipping

The plot above is only good for quick check. We will now concert the output grid into a raster object by using the code chunk below.

r <- raster(rf.krg)
r.m <- mask(r, sg_sp)

6.3 Plot the kriging map

Now, we can plot the kriging map using appropriate mapping functions of tmap package as shown in the code chunk below.

tm_shape(r.m) +
  tm_raster(n=10, 
            palette="Blues",
            auto.palette.mapping=FALSE,
title="Predicted rainfall \n(in mm)") +
tm_shape(rainfall_sp) + 
  tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)

6.3.1 Generate the variance and confidence interval maps

The rf.krg object stores not just the interpolated values, but the variance values as well. These can be passed to the raster object for mapping as follows:

r <- raster(rf.krg, layer="var1.var")
r.m <- mask(r, sg_sp)

tm_shape(r.m) +
tm_raster(n=7, 
          palette ="Reds",
          title="Variance map \n(in squared mm)") +
tm_shape(rainfall_sp) + 
  tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)

Are more readily interpretable map is the 95% confidence interval map which can be generated from the variance object as follows (the map values should be interpreted as the number of inches above and below the estimated rainfall amount).

r <- sqrt(raster(rf.krg, layer="var1.var")) * 1.96
r.m <- mask(r, sg_sp)

tm_shape(r.m) +
tm_raster(n=7, 
          palette ="Reds",
          title="95% CI map \n(in mm)") +
tm_shape(rainfall_sp) + 
  tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)