Universal Kriging

Introduction

Universal Kriging (UK) presents an alternative to Ordinary Kriging, tailored to non-stationary scenarios where the mean exhibits deterministic variations across different locations (such as trends or drifts), while the variance remains constant. The nature of this trend can span from a local scale, capturing immediate neighborhood influences, to a global scale encompassing the entire area. This approach, known as second-order stationarity or “weak stationarity,” is frequently applicable in contexts involving environmental exposures.

In Universal Kriging, the first step typically involves the calculation of the trend, which is expressed as a function of geographical coordinates. Subsequently, this trend is combined with the remaining variation (referred to as residuals), modeled as a random field. The integration of the trend and residuals is crucial in generating the final predictions.

The UK model conceptualizes the value of a variable at a specific location as the composite of a regionally varying non-stationary trend and a locally correlated random component, which represents the deviations from the regional trend.

Load package

library(plyr)
library(dplyr)
library(gstat)
library(raster)
library(ggplot2)
library(car)
library(classInt)
library(RStoolbox)
library(spatstat)
library(dismo)
library(fields)
library(gridExtra)

Load Data

The soil organic carbon data (train and test data set) could be found here.

# Define data folder
dataFolder<-"D:\\Punjab University Courses\\pu notes\\Spatial Statistics\\Lecture 5\\geospatial-r-github.io-sites\\Data\\DATA_08\\"

train<-read.csv(paste0(dataFolder,"train_data.csv"), header= TRUE) 
test<-read.csv(paste0(dataFolder,"test_data.csv"), header= TRUE) 
state<-shapefile(paste0(dataFolder,"GP_STATE.shp"))
grid<-read.csv(paste0(dataFolder, "GP_prediction_grid_data.csv"), header= TRUE)

Data Transformation

Power Transform uses the maximum likelihood-like approach of Box and Cox (1964) to select a transformation of a univariate or multivariate response for normality. First we have to calculate appropriate transformation parameters using powerTransform() function of car package and then use this parameter to transform the data using bcPower() function.

powerTransform(train$SOC)

## Estimated transformation parameter 
## train$SOC 
## 0.2523339

train$SOC.bc<-bcPower(train$SOC, 0.2523339)

First. we have to define x & y variables to coordinates

coordinates(train) = ~x+y
coordinates(grid) = ~x+y

First, we will compute and visualize a first-order trend surface using krige() function.

trend<-krige(SOC.bc~x+y, train, grid, model=NULL)

## [ordinary or weighted least squares prediction]

trend.r<-rasterFromXYZ(as.data.frame(trend)[, c("x", "y", "var1.pred")])
ggR(trend.r, geom_raster = TRUE) +
scale_fill_gradientn("", colours = c("orange", "yellow", "green",  "sky blue","blue"))+
  theme_bw()+
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
   ggtitle("Global Trend of BoxCox-SOC")+
   theme(plot.title = element_text(hjust = 0.5))

Variogram Modeling

In UK, the semivariances are based on the residuals, not the original data, because the random part of the spatial structure applies only to these residuals. The model parameters for the residuals will usually be very different from the original variogram model (often: lower sill, shorter range), since the global trend has taken out some of the variation and the long-range structure. In gstat, we can compute residual varoigram directly, if we provide an appropriate model formula; you do not have to compute the residuals manually.

We use the variogram method and specify the spatial dependence with the formula SOC.bc ~ x+y (as opposed to SOC.bc ~ 1 in the ordinary variogram). This has the same meaning as in the lm (linear regression) model specification: the SOC concentration is to be predicted using Ist order trend; then the residuals are to be modeled spatially.

# Variogram
v<-variogram(SOC.bc~ x+y, data = train, cloud=F)
# Intial parameter set by eye esitmation
m<-vgm(1.5,"Exp",40000,0.5)
# least square fit
m.f<-fit.variogram(v, m)
m.f

##   model     psill    range
## 1   Nug 0.5186164     0.00
## 2   Exp 1.0744976 81954.85

Plot Residuals varigram and Fitted model

#### Plot varigram and fitted model:
plot(v, pl=F, 
     model=m.f,
     col="black", 
     cex=0.9, 
     lwd=0.5,
     lty=1,
     pch=19,
     main="Variogram of Residuals",
     xlab="Distance (m)",
     ylab="Semivariance")

Kriging Prediction

krige() function in gstat package use for simple, ordinary or universal kriging (sometimes called external drift kriging), kriging in a local neighborhood, point kriging or kriging of block mean values (rectangular or irregular blocks), and conditional (Gaussian or indicator) simulation equivalents for all kriging varieties, and function for inverse distance weighted interpolation. For multivariate prediction.

UK<-krige(SOC.bc~x+y, 
              loc= train,        # Data frame
              newdata=grid,      # Prediction grid
              model = m.f)       # fitted varigram model

## [using universal kriging]

summary(UK)

## Object of class SpatialPointsDataFrame
## Coordinates:
##        min     max
## x -1245285  114715
## y  1003795 2533795
## Is projected: NA 
## proj4string : [NA]
## Number of points: 10674
## Data attributes:
##    var1.pred          var1.var     
##  Min.   :-0.1549   Min.   :0.7235  
##  1st Qu.: 1.2799   1st Qu.:0.9298  
##  Median : 1.8873   Median :1.0023  
##  Mean   : 1.8862   Mean   :1.0244  
##  3rd Qu.: 2.5169   3rd Qu.:1.1020  
##  Max.   : 4.1413   Max.   :1.4817

Back transformation

We will back transformation using transformation parameters that have used Box-cos transformation

k1<-1/0.2523339                                   
UK$UK.pred <-((UK$var1.pred *0.2523339+1)^k1)
UK$UK.var <-((UK$var1.var *0.2523339+1)^k1)
summary(UK)

## Object of class SpatialPointsDataFrame
## Coordinates:
##        min     max
## x -1245285  114715
## y  1003795 2533795
## Is projected: NA 
## proj4string : [NA]
## Number of points: 10674
## Data attributes:
##    var1.pred          var1.var         UK.pred            UK.var     
##  Min.   :-0.1549   Min.   :0.7235   Min.   : 0.8539   Min.   :1.944  
##  1st Qu.: 1.2799   1st Qu.:0.9298   1st Qu.: 3.0319   1st Qu.:2.305  
##  Median : 1.8873   Median :1.0023   Median : 4.6814   Median :2.444  
##  Mean   : 1.8862   Mean   :1.0244   Mean   : 5.2142   Mean   :2.497  
##  3rd Qu.: 2.5169   3rd Qu.:1.1020   3rd Qu.: 7.0191   3rd Qu.:2.644  
##  Max.   : 4.1413   Max.   :1.4817   Max.   :17.0323   Max.   :3.521

Convert to raster

UK.pred<-rasterFromXYZ(as.data.frame(UK)[, c("x", "y", "UK.pred")])
UK.var<-rasterFromXYZ(as.data.frame(UK)[, c("x", "y", "UK.var")])

Plot predicted SOC and OK Error

p1<-ggR(UK.pred, geom_raster = TRUE) +
scale_fill_gradientn("", colours = c("orange", "yellow", "green",  "sky blue","blue"))+
  theme_bw()+
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
   ggtitle("UK Predicted SOC")+
   theme(plot.title = element_text(hjust = 0.5))

p2<-ggR(UK.var, geom_raster = TRUE) +
scale_fill_gradientn("", colours = c("blue",  "green","yellow", "orange"))+
  theme_bw()+
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
   ggtitle("UK Predition Variance")+
   theme(plot.title = element_text(hjust = 0.5))

grid.arrange(p1,p2, ncol = 2)  # Multiplot

The visual representations above depict interpolated maps of soil organic carbon (SOC) along with their respective error assessments within each prediction grid. The Ordinary Kriging (OK) predicted map unveils the overarching trends and identifies hotspots of SOC concentration. It’s worth noting that kriging variance tends to be elevated in areas where no actual samples were collected, as it is contingent on the spatial arrangement of the sampling locations. In contrast, lower variance is observed in proximity to the actual sampling sites. This kriging variance is influenced by the covariance model but remains independent of the specific data values.

Note: The data and text are copied from https://github.com/zia207.