Spatial interpolation has been widely and commonly used in many studies to create surface data based on a set of sampled points, such as soil properties, temperature, and precipitation. Simple approach like inverse distance weighted algorithm or the well known kriging procedures have been routinely been applied in many studies. In this hands-on exercise, you will gain hands-on experience on using appropriate R functions to perform spatial interpolation by using these two popular methods.
The specific questions we would like to answer are as follows:
given a distribution of point meteorological stations showing rainfall values, how I can I estimate the rainfall values where data were not observed?
after deriving the estimate values, can I determine the confident intervals of these values?
To provide answers to the questions above, two data sets will be used. They are:
In this hands-on exercise, three new R packages will be used, they are:
gstat, a programme for spatial and spatio-temporal geostatistical modelling, prediction and simulation.
automap, a programme for performing an automatic interpolation by automatically estimating the variogram and then calling gstat.
raster which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster).
sf, a package that specially design for handling, processing and wrangling simple features in R.
sp, Classes and methods for spatial data.
tmap, a package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and proportional symbol maps.
tidyverse provide a family of R package for perfroming data science tasks.
The code chunk below will be used to check if the above packages have been installed in R. If they have yet to be installed, they will be installed. After insatlling all these packages, the will be launched in RStudio by using library().
packages = c('sf', 'sp', 'raster', 'gstat',
'automap','tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The CoastalOutline geospatial data is in ESRI shapefile format. The code chunk below uses st_read() of st package to import the geospatial data in R and save it as simple feature data.frame.
sg <- st_read(dsn = "data/geospatial",
layer="CostalOutline",
crs = 3414)
## Reading layer `CostalOutline' from data source `D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex12\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 60 features and 4 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
## projected CRS: SVY21 / Singapore TM
Note that crs argument of st_read() to specify the geospatial data is in svy21 projected coordinate system.
You can double confirm the projection by using crs() as shown below.
crs(sg)
## CRS arguments:
## +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0
## +units=m +no_defs
Next, the code chunk below will be used to display the data structure of the simple feature data.frame.
glimpse(sg)
## Rows: 60
## Columns: 5
## $ GDO_GID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ MSLINK <dbl> 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,...
## $ MAPID <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ COSTAL_NAM <chr> "Linkway", "SENTOSA", "PULAU SARIMBUN", "PULAU SAMULUN",...
## $ geometry <POLYGON [m]> POLYGON ((14362.86 32307.49..., POLYGON ((25683....
We can also display the geospatial data as a map by using tmap() package.
tm_shape(sg)+
tm_polygons()
In this section, we will import the an aspatial data called Monthly_rainfall_202009.csv in R and then convert it into a simple point feature data.frame.
The code chunk below uses read_csv() function of readr package to import Monthly_rainfall_202009.csv into R as a tibble data frame called rainfall.
rainfall <- read_csv("data/aspatial/Monthly_rainfall_202009.csv")
After importing the data file into R, it is important for us to examine if the data file has been imported correctly.
The code chunk below shows list() is used to do the job.
list(rainfall)
## [[1]]
## # A tibble: 63 x 4
## Station Latitude Longitude TMR
## <chr> <dbl> <dbl> <dbl>
## 1 Admiralty 1.44 104. 253.
## 2 Admiralty (West) 1.46 104. NA
## 3 Ang Mo Kio 1.38 104. 235.
## 4 Boon Lay (East) 1.33 104. NA
## 5 Boon Lay (West) 1.33 104. NA
## 6 Botanic Garden 1.31 104. 338.
## 7 Buangkok 1.38 104. NA
## 8 Bukit Panjang 1.38 104. 351
## 9 Bukit Timah 1.32 104. 299.
## 10 Buona Vista 1.28 104. 409.
## # ... with 53 more rows
Notice that the rainfall data in tibble data frame and not the common R data frame.
The code chunk below converts rainfall data frame into a simple feature data frame by using st_as_sf() of sf packages
rainfall_sf <- st_as_sf(rainfall,
coords = c("Longitude", "Latitude"),
crs= 4326) %>%
st_transform(crs = 3414)
Things to learn from the arguments above:
Figure below shows the data table of rainfall_sf. Notice that a new column called geometry has been added into the data frame.
You can display the basic information of the newly created rainfall_sf by using the code chunk below.
list(rainfall_sf)
## [[1]]
## Simple feature collection with 63 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 4081.375 ymin: 19099.2 xmax: 44613.25 ymax: 48865.9
## projected CRS: SVY21 / Singapore TM
## # A tibble: 63 x 3
## Station TMR geometry
## * <chr> <dbl> <POINT [m]>
## 1 Admiralty 253. (22667.41 47284.7)
## 2 Admiralty (West) NA (23769.15 48865.9)
## 3 Ang Mo Kio 235. (29767.41 39820.84)
## 4 Boon Lay (East) NA (15444.45 34712.56)
## 5 Boon Lay (West) NA (13630.41 34414.1)
## 6 Botanic Garden 338. (26295.19 32334.92)
## 7 Buangkok NA (33862.77 40628.1)
## 8 Bukit Panjang 351 (19873.96 40484.41)
## 9 Bukit Timah 299. (26417.61 33484.9)
## 10 Buona Vista 409. (23023.19 29614.82)
## # ... with 53 more rows
The output shows that rainfall_sf is in point feature class. It’s epsg ID is 3414. The bbox provides information of the extend of the geospatial data.
We can also prepare a point value map by using the code chunk below.
tm_shape(sg) +
tm_polygons() +
tm_shape(rainfall_sf) +
tm_dots(col="TMR",
palette = "Blues", auto.palette.mapping = FALSE,
title="Sampled rainfall \n(in mm)", size=0.7) +
tm_text("TMR", just="left", xmod=.5, size = 0.7) +
tm_legend(legend.outside=TRUE)
The IDW output is a raster. This requires that we first create an empty raster grid, then interpolate the precipitation values to each unsampled grid cell. An IDW power value of 2 (idp=2.0) will be used.
# Create an empty grid where n is the total number of cells
sg_sp <- as_Spatial(sg)
rainfall_sp <- rainfall_sf %>%
filter(TMR > 0) %>%
as_Spatial()
rainfall_sp@bbox <- sg_sp@bbox
grd <- as.data.frame(spsample(rainfall_sp, "regular", n=50000))
names(grd) <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd) <- TRUE # Create SpatialPixel object
fullgrid(grd) <- TRUE # Create SpatialGrid object
# Add P's projection information to the empty grid
proj4string(rainfall_sp) <- proj4string(rainfall_sp) # Temp fix until new proj env is adopted
proj4string(grd) <- proj4string(rainfall_sp)
plot(grd)
The code chunk below uses idw() of gstat package to derive the rainfall surface layer.
P.idw <- gstat::idw(TMR ~ 1, rainfall_sp, newdata=grd, idp=2.0)
## [inverse distance weighted interpolation]
Notice that the power factor used is 2.
Before we can plot the output as a surface map using tmap package we need convert the output grid layer into R’s raster format. We will also clip the output raster using Singapore coastal outline in order to ensure that the final plot will only confine within Singapore coastal outline.
r <- raster(P.idw)
r.m <- mask(r, sg_sp)
Next, we will plot the interpolated grid data by using the code chunk below.
tm_shape(r.m) +
tm_raster(n=10,
palette = "Blues",
midpoint = TRUE,
title="Predicted precipitation \n(in inches)") +
tm_shape(rainfall_sp) +
tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)
In addition to generating an interpolated surface, you can create a 95% confidence interval map of the interpolation model. Here we’ll create a 95% CI map from an IDW interpolation that uses a power parameter of 2 (idp=2.0).
img <- gstat::idw(TMR~1,
rainfall_sp,
newdata=grd,
idp=2.0)
## [inverse distance weighted interpolation]
n <- length(rainfall_sp)
Zi <- matrix(nrow = length(img$var1.pred),
ncol = n)
Next, we will remove a point then interpolate (do this n times for each point) by using the code chunk below.
st <- stack()
for (i in 1:n){
Z1 <- gstat::idw(TMR~1,
rainfall_sp[-i,],
newdata=grd,
idp=2.0)
st <- addLayer(st,raster(Z1,layer=1))
# Calculated pseudo-value Z at j
Zi[,i] <- n * img$var1.pred - (n-1) * Z1$var1.pred
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
This is follow by calculating Jackknife estimator of parameter Z at location j by using the code chunk below.
Zj <- as.matrix(apply(Zi, 1, sum, na.rm=T) / n )
# Compute (Zi* - Zj)^2
c1 <- apply(Zi,2,'-',Zj) # Compute the difference
c1 <- apply(c1^2, 1, sum, na.rm=T ) # Sum the square of the difference
Next, we will compute the confidence interval by using the code chunk below.
CI <- sqrt( 1/(n*(n-1)) * c1)
In order to display the CI output on a map, we will create (CI / interpolated value) raster by using the code chunk below.
img.sig <- img
img.sig$v <- CI /img$var1.pred
We will then use the code chunk below to clip the output raster within the coastal outline of Singapore.
r <- raster(img.sig, layer="v")
r.m <- mask(r, sg_sp)
Now, we can plot the CI map by using the code chunk below.
tm_shape(r.m) +
tm_raster(n=8,
title="95% confidence interval \n(in mm)") +
tm_shape(rainfall_sp) +
tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)
First, we need to create a variogram model. Note that the variogram model is computed on the de-trended data. This is implemented in the following chunk of code by passing the 1 order trend model (defined in an earlier code chunk as formula object f.1 ) to the variogram function.
To begin, we will use formula() to define the variogram equation.
f.1 <- as.formula(TMR ~ 1)
rainfall_sp$X <- coordinates(rainfall_sp)[,1]
rainfall_sp$Y <- coordinates(rainfall_sp)[,2]
Notice that the code chunk above also add the x-coordinates and y-coordinates to rainfall_sp SpatialPointDataFrame.
There are several ways to fit a variogram model to an empirical variogram. We will use the simplest one—automatic fitting using function autofitVariogram() from package automap.
rfVariogram <- autofitVariogram(f.1, rainfall_sp)
The function chooses the best fitting type of model, and also fine tunes its parameters. (Use show.vgms() to display variogram model types.) Note that the autofitVariogram function does not work on sf objects, which is why we convert the object to a SpatialPointsDataFrame (package sp).
The fitted model can be plotted by using the code chunk below.
plot(rfVariogram)
You can also using the model argument of autofitVariogram to fit different models as shown in the code chunk below.
rf_sph <- autofitVariogram(f.1,
rainfall_sp,
model = c("Sph"))
rf_exp <- autofitVariogram(f.1,
rainfall_sp,
model = c("Exp"))
rf_gau <- autofitVariogram(f.1,
rainfall_sp,
model = c("Gau"))
rf_ste <- autofitVariogram(f.1,
rainfall_sp,
model = c("Ste"))
Then, plot them out for comparison.
par(mfrow=c(2,2))
plot(rf_sph)
plot(rf_exp)
plot(rf_gau)
plot(rf_ste)
The resulting object of autofitVariogram() is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:
rfVariogram$var_model
## model psill range kappa
## 1 Nug 1899.784 0.00 0
## 2 Ste 23042.918 39877.66 10
The variogram model can then be passed to the gstat(), and we can carry on with the Ordinary Kriging interpolation:
rf.krg <- krige(f.1,
rainfall_sp,
grd,
rfVariogram$var_model)
## [using ordinary kriging]
We can plot the output by using the code chunk below.
plot(rf.krg)
The plot above is only good for quick check. We will now concert the output grid into a raster object by using the code chunk below.
r <- raster(rf.krg)
r.m <- mask(r, sg_sp)
Now, we can plot the kriging map using appropriate mapping functions of tmap package as shown in the code chunk below.
tm_shape(r.m) +
tm_raster(n=10,
palette="Blues",
auto.palette.mapping=FALSE,
title="Predicted rainfall \n(in mm)") +
tm_shape(rainfall_sp) +
tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)
The rf.krg object stores not just the interpolated values, but the variance values as well. These can be passed to the raster object for mapping as follows:
r <- raster(rf.krg, layer="var1.var")
r.m <- mask(r, sg_sp)
tm_shape(r.m) +
tm_raster(n=7,
palette ="Reds",
title="Variance map \n(in squared mm)") +
tm_shape(rainfall_sp) +
tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)
Are more readily interpretable map is the 95% confidence interval map which can be generated from the variance object as follows (the map values should be interpreted as the number of inches above and below the estimated rainfall amount).
r <- sqrt(raster(rf.krg, layer="var1.var")) * 1.96
r.m <- mask(r, sg_sp)
tm_shape(r.m) +
tm_raster(n=7,
palette ="Reds",
title="95% CI map \n(in mm)") +
tm_shape(rainfall_sp) +
tm_dots(size=0.1) +
tm_legend(legend.outside=TRUE)