1 Loading R Libraries

Package	Title	Maintainer	Version	URL
sp	Classes and Methods for Spatial Data	Edzer Pebesma <edzer.pebesma@uni-muenster.de>;	1.4-2	NA
gstat	Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation	Edzer Pebesma <edzer.pebesma@uni-muenster.de>;	2.0-6	NA
geoR	Analysis of Geostatistical Data	Paulo J. Ribeiro Jr <paulojus@ufpr.br>;	1.8-1	NA
dismo	Species Distribution Modeling	Robert J. Hijmans <r.hijmans@gmail.com>;	1.1-4	NA
ggplot2	Create Elegant Data Visualisations Using the Grammar of Graphics	Thomas Lin Pedersen <thomas.pedersen@rstudio.com>;	3.3.2	http://ggplot2.tidyverse.org,
summarytools	Tools to Quickly and Neatly Summarize Data	Dominic Comtois <dominic.comtois@gmail.com>;	0.9.6	NA
Amelia	A Program for Missing Data	Matthew Blackwell <mblackwell@gov.harvard.edu>;	1.7.6	NA
lattice	Trellis Graphics for R	Deepayan Sarkar <deepayan.sarkar@r-project.org>;	0.20-38	NA
dplyr	A Grammar of Data Manipulation	Hadley Wickham <hadley@rstudio.com>;	1.0.0	https://dplyr.tidyverse.org,
gridExtra	Miscellaneous Functions for “Grid” Graphics	Baptiste Auguie <baptiste.auguie@gmail.com>;	2.3	NA
kableExtra	Construct Complex Table with ‘kable’ and Pipe Syntax	Hao Zhu <haozhu233@gmail.com>;	1.1.0	http://haozhu233.github.io/kableExtra/,

2 Dataset Description

The Walker Lake Data Set: was derived from a digital elevation model from the western United States; the Walker Lake area in Nevada. This is the Walker Lake data sets, used in Isaaks and Srivastava’s Applied Geostatistics. This data frame contains the following columns:

Id:Identification Number

X: East location in meter

Y: North location in meter

V: V variable, concentration in ppm

U :U variable, concentration in ppm

T: T variable, indicator variable

3 Loading Dataframe

# Walker Lake Data Set
domain <- read.csv("walker.csv", header=TRUE, skip=0, sep=",", dec=".")
domain$V <- as.numeric(domain$V)
domain$U <- as.numeric(domain$U)
head(domain, 5)

str(domain)

## 'data.frame':    470 obs. of  6 variables:
##  $ IDSAMPLE: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X       : int  11 8 9 8 9 10 9 11 10 8 ...
##  $ Y       : int  8 30 48 69 90 110 129 150 170 188 ...
##  $ V       : num  0.7 0.7 224.4 434.4 412.1 ...
##  $ U       : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
##  $ IND     : int  2 2 2 2 2 2 2 2 2 2 ...

4 Exploratory Data Analysis

4.1 Summary Statistics

print(dfSummary(domain[, 4:5], graph.magnif = 0.75), method='render')

Data Frame Summary

domain

Dimensions: 470 x 2
Duplicates: 25

No	Variable	Stats / Values	Freqs (% of Valid)	Graph	Valid	Missing
1	V [numeric]	Mean (sd) : 435.3 (299.8) min < med < max: 0.7 < 424 < 1528.1 IQR (CV) : 456.2 (0.7)	441 distinct values		470 (100%)	0 (0%)
2	U [numeric]	Mean (sd) : 353.5 (657.9) min < med < max: 0.1 < 25.3 < 5190.1 IQR (CV) : 428.5 (1.9)	260 distinct values		470 (100%)	0 (0%)

Generated by summarytools 0.9.6 (R version 3.6.3)
2020-07-30

TBL.Summary <- summarytools::descr(domain[, 4:5])
TBL.Summary

## Descriptive Statistics  
## domain  
## N: 470  
## 
##                           U         V
## ----------------- --------- ---------
##              Mean    353.49    435.33
##           Std.Dev    657.87    299.83
##               Min      0.10      0.70
##                Q1      0.10    184.40
##            Median     25.30    424.00
##                Q3    429.70    641.30
##               Max   5190.10   1528.10
##               MAD     37.36    342.48
##               IQR    428.52    456.25
##                CV      1.86      0.69
##          Skewness      2.98      0.46
##       SE.Skewness      0.11      0.11
##          Kurtosis     11.65     -0.14
##           N.Valid    470.00    470.00
##         Pct.Valid    100.00    100.00

4.2 Location Map, Hitogram, and Drifts)

domain.geoR <- geoR::as.geodata(domain, coords.col = 2:3, data.col = 4)
plot(domain.geoR, lowess=TRUE, scatter3d=FALSE, cex=0.8)

dev.print(png, "basemap1.png", height=900, width=1200)           # save to the directory

## png 
##   2

4.3 Probability-Plot

library("car")
qqPlot(domain$V, distribution="norm", lwd=2, pch="+", line="robust", las=0, cex=0.8, col.lines="red", main="Quantile-Comparison Plot")

## [1] 232 219

5 Decile Analysis

# Top Decile
table.deciles <- domain %>%
  mutate(DECILE = ntile(V, 10)) %>%
  group_by(DECILE) %>%
  summarize(
    Count      = n(),
    Average    = mean(V),
    Min        = min(V),
    Max        = max(V),
    Q          = sum(V)) %>%
  mutate(
    Perc.Total = 100 * Q / sum(Q)
  )

# Top Percentile
perc.top10 <- domain %>%
  mutate(DECILE = ntile(V, 100)) %>%
  group_by(DECILE) %>%
  summarize(
    Count      = n(),
    Average    = mean(V),
    Min        = min(V),
    Max        = max(V),
    Q          = sum(V)) %>%
  mutate(
    Perc.Total = 100 * Q / sum(Q)
  )
table.percentiles <- tail(perc.top10, 10)

# Merge the tables
deciles <- rbind(table.deciles, table.percentiles)
kable(deciles, format = "html", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

DECILE	Count	Average	Min	Max	Q	Perc.Total
1	47	9.704255	0.7	30.9	456.1	0.2229165
2	47	85.287234	31.3	141.9	4008.5	1.9591331
3	47	185.142553	144.8	233.6	8701.7	4.2529097
4	47	274.987234	234.0	329.1	12924.4	6.3167320
5	47	375.048936	332.7	423.4	17627.3	8.6152494
6	47	472.206383	424.6	518.6	22193.7	10.8470532
7	47	566.812766	518.9	602.4	26640.2	13.0202565
8	47	640.940425	602.6	689.1	30124.2	14.7230430
9	47	759.797872	696.5	817.0	35710.5	17.4533176
10	47	983.387234	820.8	1528.1	46219.2	22.5893890
91	4	871.925000	868.8	876.0	3487.7	1.7045949
92	4	879.750000	876.4	883.6	3519.0	1.7198926
93	4	892.375000	889.5	895.2	3569.5	1.7445742
94	4	909.125000	899.3	915.6	3636.5	1.7773201
95	4	935.725000	918.0	959.3	3742.9	1.8293225
96	4	968.750000	963.9	971.9	3875.0	1.8938857
97	4	990.300000	974.5	1008.0	3961.2	1.9360155
98	4	1052.750000	1022.3	1082.8	4211.0	2.0581039
99	4	1160.575000	1109.0	1215.8	4642.3	2.2688995
100	4	1425.425000	1259.9	1528.1	5701.7	2.7866756

write.csv(deciles, file = "deciles.csv")

5.1 Top Capping Outliers

# Boxplot
OutVals <- boxplot(domain$V, col="gray", frame=F, notch=F, main="Boxplot")$out

OutVals

## [1] 1521.1 1528.1 1392.6

# Top Capping values
library(data.table)
outlierReplace = function(dataframe, cols, rows, newValue = NA) {
    if (any(rows)) {
        set(dataframe, rows, cols, newValue)
    }
}

outlierReplace(domain, "V", which(domain$V > 1392.6), 1392.6)

# Check
max(domain$V)

## [1] 1392.6

5.2 Correlation Matrix

tiff(file = "Correlation_Matrix.png", width=960, height=960, bg="white")
PerformanceAnalytics::chart.Correlation(as.data.frame(domain)[, 4:5], histogram=TRUE, pch="*", method = c("spearman"))
dev.off()

## png 
##   2

PerformanceAnalytics::chart.Correlation(as.data.frame(domain)[, 4:5], histogram=TRUE, pch="*", method = c("spearman"))

5.3 Drifts (Derivas)

# Residuals analysis
V.lm <- lm(log(V) ~ sqrt(Y), domain)
domain$fitted.s <- predict(V.lm, domain) - mean(predict(V.lm, domain))
domain$residuals <- residuals(V.lm)

# Convert to spatial dataframe
domain.cpy <- domain
coordinates(domain.cpy) <- ~ X + Y
spplot(domain.cpy, c("fitted.s", "residuals"), col.regions=rainbow(100))

5.4 Eliminating the Trends

plot(domain.geoR, trend=~coords, col.regions=rainbow(100))

6 Convert to Spatial Dataframe

Variogram model controls Kriging weights. Variogram is mathematically defined as a measure of semivariance as a function of distance.

sp::coordinates(domain) <- ~ X + Y
str(domain)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  470 obs. of  6 variables:
##   .. ..$ IDSAMPLE : int [1:470] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ V        : num [1:470] 0.7 0.7 224.4 434.4 412.1 ...
##   .. ..$ U        : num [1:470] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
##   .. ..$ IND      : int [1:470] 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ fitted.s : num [1:470] 0.3701 0.2544 0.191 0.1307 0.0791 ...
##   .. ..$ residuals: num [1:470] -6.206 -6.09 -0.257 0.464 0.463 ...
##   ..@ coords.nrs : int [1:2] 2 3
##   ..@ coords     : num [1:470, 1:2] 11 8 9 8 9 10 9 11 10 8 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:470] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "X" "Y"
##   ..@ bbox       : num [1:2, 1:2] 8 8 251 291
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "X" "Y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr NA

6.1 H-Scatterplots

gstat::hscat(V~1, domain, c(0, 5, 10, 20, 30, 50), pch=3, cex=0.8, col="red")

7 Fitting a Theoretical Variogram (Omnidirectional)

To perform kriging, you must first have a variogram model, from which the data can be interpolated. There are a couple steps involved:

Calculate the variogram cloud and experimental variogram. This is done with the variogram function,
Fit a model to the sample variogram. Default initial parameter values are chosen from the sample variogram, where:

the range parameter is taken as 1/3 of the maximum sample variogram distance,
- the nugget parameter is taken as the mean of the first three sample variogram values, and
- the partial sill is taken as the mean of the last five sample variogram values.

In this first part, we are going to calculate the omnidirectional variogram. Later, we are going to calculate an directional variogram. Here, we assume that there is a constant trend in the data.

Note 1: Model type in vgm function are: “Exp”, “Sph”, “Gau”, “Mat” Note 2: Values for fit.method are 1: weights equal to \(N_j\); 2: weights equal to \(N_j/((gamma(h_j))^2)\); 5 (ignore, use fit.variogram.reml); 6: unweighted (OLS); 7: \(N_j/(h_j^2)\).

omnivgram <- gstat::variogram(V~1, 
                                  width=10,             # lag
                                  cutoff=200,           # max distance to calculate the variogram
                                  tol.hor=90, 
                                  tol.ver=90,
                                  data=domain)
plot(omnivgram, threshold=30, col="red", xlab="Distance", ylab="Semivariance", pch=20, ylim=c(20000, 100000), cex=1.5, main="Omnidirectional Semivariogram", cex.main=2.0, cex.lab=1.25, cex.axis=1.25)

dev.print(png, "omnivgram.png", height=600, width=700)           # save to the directory

## png 
##   2

omnivgram

# Fit the theoretical variogram model
# psill is "Partial Sill" (do not confound with the sill)
omnivgmodel <- fit.variogram(omnivgram, 
                            model=vgm(psill=var(domain$V), model="Sph", range=NA, nugget=NA),
                            fit.sills=T, 
                            fit.ranges=T, 
                            fit.method=7, 
                            warn.if.neg=T, 
                            fit.kappa=F)
print(plot(variogramLine(omnivgmodel, 200, threshold=30), type='l', lwd = 2, ylim=c(0, 100000), main="Omnidirectional Semivariogram Model", xlab="Distance", ylab="Semivariance", cex.main=2.0, cex.lab=1.25, cex.axis=1.25))

## NULL

points(omnivgram[,2:3], pch=20, cex=1.5, col='red')

dev.print(png, "omnivgmodel.png", height=600, width=700)           # save to the directory

## png 
##   2

omnivgmodel

8 Variogram Cloud

cloud <- variogram(V~1, data=domain, cloud=TRUE)

# Plot
plot(cloud$dist, cloud$gamma, pch="+", cex=1.5, xlab="Distance [m]", ylab="Semivariance", main="Semivariogram Cloud", col = rainbow(length(cloud$gamma))[rank(cloud$gamma)])
points(omnivgram[,2:3], type= "l", col='black')

9 Identify Outliers pairs

sel.cloud <- plot(variogram(V~1, data=domain, cloud=T), pch = 16, cex = 1, digitize = F, col = "green4")

# Interactive selection
sel.cloud

out.pair <- function(x, data, sel){
    a <- as.data.frame(data[sel$head,as.character(x)])
    b <- as.data.frame(data[sel$tail,as.character(x)])
    ID.a <- round(as.numeric(rownames(as.data.frame(data[sel$head,]))),0)
    ID.b <- round(as.numeric(rownames(as.data.frame(data[sel$tail,]))),0)
    out <- cbind(ID.a,a[,as.character(x)],ID.b,b[,as.character(x)])
    colnames(out) <- c("ID.a",paste(x,".a",sep=""),"ID.b",paste(x,".b",sep=""))
    out
    }

out.pair(x="V", data = domain, sel = sel.cloud)

##      ID.a V.a ID.b V.b

The recommendation in most cases is to play with these parameters to be sure that the variogram is correctly displayed. Among these settings, two are of particular interest: cutoff and width. The cutoff represents the maximal spatial distance taken into account between two observations. The width (similar to lag) is the distance interval over which the semi-variance is calculated.

10 Analyzing the Anisotropy (Variogram Map)

Another means of looking at directional dependence in semivariograms is obtained by looking at variogram maps. Instead of classifying point pairs Z(s) and Z(s + h) by direction and distance class separately, we can classify them jointly.

vgram.map <- variogram(V~1, domain, cutoff=200, width=10, map=TRUE)       # cutoff == maxdist; width == lag
plot(vgram.map, threshold=0, main="Variogram Map", col.regions=rainbow(100))

The threshold assures that only semivariogram map values based on at least N point pairs are shown, removing too noisy estimation.

11 Directional Semivariograms

In this second part, we are going to calculate the directional semivariograms.

dirvgrams <- gstat::variogram(V~1, 
                                 width=10, 
                                 cutoff=200, 
                                 tol.hor=10, 
                                 tol.ver=90, 
                                 alpha=c(180,160,140,120,100,80,60,40,20), 
                                 data=domain)

plot(dirvgrams, threshold=30, as.table = TRUE, col="red", pch=20, cex=1.2, lwd = 2, main="Directional Semivariograms", cex.main=2.0, cex.lab=1.25, cex.axis=1.25)

vario.4 <- geoR::variog4(domain.geoR, max.dist = 200)

## variog: computing variogram for direction = 0 degrees (0 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 45 degrees (0.785 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 90 degrees (1.571 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 135 degrees (2.356 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing omnidirectional variogram

plot(vario.4, lwd = 2, main="Directional Semivariograms")

12 Fit the Theoretical Directional Variogram Model

In two dimensions, two parameters define an anisotropy ellipse, say anis = c(30, 0.5). The first parameter, 30, refers to the main axis direction: it is the angle for the principal direction of continuity (measured in degrees, clockwise from positive Y, i.e. North). The second parameter, 0.5, is the anisotropy ratio, the ratio of the minor range to the major range (a value between 0 and 1).

dirvgram.final <- gstat::variogram(V~1, 
                                 width=10, 
                                 cutoff=200, 
                                 tol.hor=10, 
                                 tol.ver=90, 
                                 alpha=c(160), 
                                 data=domain)
dirvgram.final

# Fit the theoretical variogram model with gstat
# Note: psill is "Partial Sill" (do not confound with the sill)
dirvgmodel.final <- fit.variogram(dirvgram.final, model=vgm(psill=NA, model="Sph", range=NA, nugget=NA, 
                            anis=c(160, 0.7),
                            fit.sills=T, 
                            fit.ranges=T, 
                            fit.method=7, 
                            warn.if.neg=T, 
                            fit.kappa=F))
print(plot(variogramLine(dirvgmodel.final, 200, threshold=30), type='l', lwd = 2, ylim=c(0, 100000), main="Final Semivariogram Model", xlab="Distance", ylab="Semivariance", cex.main=2.0, cex.lab=1.25, cex.axis=1.25))

## NULL

points(dirvgram.final[,2:3], pch=20, cex=1.5, col='red')

dev.print(png, "dirvgmodel.final.png", height=600, width=900)

## png 
##   2

dirvgmodel.final

Performing a semi-variogram analysis requires to carefully set the variogram parameters so that the spatial structure is well-captured. manual supervision is really important when dealing with semi-variograms. I would recommend not to automate a semi-variogram analysis as it could lead to misleading conclusions. If the spatial structure is very well-defined and extremely clear, this won’t be a problem because default parameters might be used. However, in most cases, the automatic fit of a semi-variogram model might be problematic.

13 Cross Validation Using gstat

# five-fold cross validation (map of residuals)
xvalid <- gstat::krige.cv(V~1, domain, dirvgmodel.final, nmax=32, nfold=5)
bubble(xvalid, "residual", main = "V: 5-fold CV Residuals")

# Summary
summary(xvalid)

## Object of class SpatialPointsDataFrame
## Coordinates:
##   min max
## X   8 251
## Y   8 291
## Is projected: NA 
## proj4string : [NA]
## Number of points: 470
## Data attributes:
##    var1.pred           var1.var        observed         residual      
##  Min.   :  -8.468   Min.   :25825   Min.   :   0.7   Min.   :-670.85  
##  1st Qu.: 269.233   1st Qu.:33324   1st Qu.: 184.6   1st Qu.:-131.95  
##  Median : 434.321   Median :38719   Median : 424.0   Median : -19.91  
##  Mean   : 450.225   Mean   :44942   Mean   : 434.8   Mean   : -15.46  
##  3rd Qu.: 592.660   3rd Qu.:55405   3rd Qu.: 640.9   3rd Qu.: 113.31  
##  Max.   :1155.451   Max.   :86776   Max.   :1392.6   Max.   : 556.49  
##      zscore              fold      
##  Min.   :-3.49078   Min.   :1.000  
##  1st Qu.:-0.60603   1st Qu.:2.000  
##  Median :-0.10109   Median :3.000  
##  Mean   :-0.04974   Mean   :3.038  
##  3rd Qu.: 0.55600   3rd Qu.:4.000  
##  Max.   : 3.06539   Max.   :5.000

Histogram of Residuals and Scatterplot Residuals vs Fitted

par(mfrow=c(1, 2))
hist(xvalid$residual, col="green4", main=paste("Histogram: 5-fold CV Residuals"))
# Scatterplots Residuals vs Fitted
plot(lm(xvalid$observed~xvalid$var1.pred, pch="+", col="green4", cex=1.0, xlab="Actual", 
     ylab="Fitted", xlim=c(0, max(domain$V)), ylim=c(0, max(domain$V)), main="Cross Validation OK Residuals"))

## Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
##  extra arguments 'pch', 'col', 'cex', 'xlab', 'ylab', 'xlim', 'ylim', 'main' will be disregarded

abline(lm(xvalid$observed~xvalid$var1.pred))

Comparing errors to standard errors

sigmazscores <- abs(xvalid$zscore) > 2
xvalid$big <- factor(ifelse(xvalid$zscore <= -2, 'low', ifelse(xvalid$zscore < 2, 'moderate', 'positive')))
xvalid

## class       : SpatialPointsDataFrame 
## features    : 470 
## extent      : 8, 251, 8, 291  (xmin, xmax, ymin, ymax)
## crs         : NA 
## variables   : 7
## names       :         var1.pred,         var1.var, observed,          residual,            zscore, fold,      big 
## min values  : -8.46797696528068, 25824.5208454136,      0.7, -670.847801683615, -3.49078153665091,    1,      low 
## max values  :  1155.45068296099, 86776.1755570282,   1392.6,  556.486813365955,  3.06538750311156,    5, positive

length(xvalid$zscore[sigmazscores])

## [1] 19

Plot Standard Error of Residuals (Zscores > |2|)

spplot(xvalid, zcol='big', pch=c(19, 1, 19), col.regions=c('blue2','black', 'red3'), scales=list(draw=T), main='Standard Error OK-XValid Residuals (z-score > |2|)')

Finally, we check for spatial auto-correlation in the residuals - that is, to what extent are the errors independent? Are there structures in this plot?

cv2.vg <- variogram(residual~1, xvalid, width=0.15)
plot(cv2.vg, pch=20, cex=1.5, col='red', xlab="Distance", ylab="Semivariance", main="Variogram for XValid-Residuals")

Using root of the mean squared error (RMSE) to compare models and evaluate which model is the best; the lower RMSE, the better of the model

RMSE <- sqrt(sum(xvalid$residual^2)/length(xvalid$residual)) 
round(RMSE, digits = 2)

## [1] 183.29

14 Make a Grid for Interpolation

grid <- makegrid(domain, cellsize=2.5)
names <- c("X", "Y")
colnames(grid) <- names
coordinates(grid) = ~X+Y

15 Declustered Model (Thiessen Polygons - proximity model)

nn.model <- dismo::voronoi(domain)
nn.plot <- spplot(nn.model["V"], col.regions=terrain.colors(50), scales=list(draw=T), p.layout=list("V", domain, pch="+"), main = "Proximity Model")
nn.plot

# Summary Statistics
summary(nn.model)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##     min   max
## x -16.3 275.3
## y -20.3 319.3
## Is projected: NA 
## proj4string : [NA]
## Data attributes:
##     IDSAMPLE           V                U               IND       
##  Min.   :  1.0   Min.   :   0.7   Min.   :   0.1   Min.   :1.000  
##  1st Qu.:118.2   1st Qu.: 184.6   1st Qu.:   0.1   1st Qu.:2.000  
##  Median :235.5   Median : 424.0   Median :  25.3   Median :2.000  
##  Mean   :235.5   Mean   : 434.8   Mean   : 353.5   Mean   :1.904  
##  3rd Qu.:352.8   3rd Qu.: 640.9   3rd Qu.: 428.6   3rd Qu.:2.000  
##  Max.   :470.0   Max.   :1392.6   Max.   :5190.1   Max.   :2.000  
##     fitted.s          residuals      
##  Min.   :-0.25174   Min.   :-6.2062  
##  1st Qu.:-0.13652   1st Qu.:-0.2273  
##  Median :-0.02241   Median : 0.4976  
##  Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.10286   3rd Qu.: 0.9952  
##  Max.   : 0.37012   Max.   : 1.8698

16 Non-Geostatistical Interpolation Method: Inverse Distance (power=2.0)

idw.model <- idw(V~1, 
                locations = domain,
                nmin=4,
                nmax=32,
                maxdist=150,
                newdata = grid,
                idp=2.0)

## [inverse distance weighted interpolation]

idw.plot <- spplot(idw.model["var1.pred"], col.regions=terrain.colors(50), scales=list(draw=T), p.layout=list("var1.pred", domain, pch="+"), main = "Inverse Distance Interpolation")
idw.plot

# Output
write.csv(idw.model, file="OK_model.csv")

17 Interpolating with Ordinary Kriging (Local Kriging)

ok.model <- krige(V~1, 
                locations = domain,
                nmin=4,
                nmax=32,
                maxdist=150,
                newdata = grid,
                model = dirvgmodel.final)

## [using ordinary kriging]

# Summary
summary(ok.model)

## Object of class SpatialPointsDataFrame
## Coordinates:
##    min    max
## X 9.25 251.75
## Y 9.25 291.75
## Is projected: NA 
## proj4string : [NA]
## Number of points: 11172
## Data attributes:
##    var1.pred         var1.var    
##  Min.   : -54.9   Min.   :22068  
##  1st Qu.: 124.8   1st Qu.:31642  
##  Median : 244.1   Median :40612  
##  Mean   : 285.0   Mean   :39354  
##  3rd Qu.: 393.2   3rd Qu.:46490  
##  Max.   :1301.8   Max.   :58050

# Kriging Plan View
ok.plot <- spplot(ok.model["var1.pred"], col.regions=terrain.colors(50), scales=list(draw=T), p.layout=list("var1.pred", domain, pch="+"), main = "Local Ordinary Kriging Estimation")
ok.plot

# Kriging Variance
kv.plot <- spplot(ok.model["var1.var"], col.regions=terrain.colors(50), scales=list(draw=T), p.layout=list("var1.var", domain, pch="+"), main = "Ordinary Kriging Variance")
kv.plot

# Output
write.csv(ok.model, file="OK_model.csv")

18 Conditional Gaussian Simulation

# One Conditional Simulation
set.seed(619)
condsim1 = krige(V ~ 1, domain, grid, model = dirvgmodel.final, nmin=2, nmax=32, maxdist=150, nsim=1)

## drawing 1 GLS realisation of beta...
## [using conditional Gaussian simulation]

sgsim1.plot <- spplot(condsim1, col.regions=terrain.colors(50), main = "Conditional Simulation 1")

set.seed(317)
condsim2 = krige(V ~ 1, domain, grid, model = dirvgmodel.final, nmin=2, nmax=32, maxdist=150, nsim=1)

## drawing 1 GLS realisation of beta...
## [using conditional Gaussian simulation]

sgsim2.plot <- spplot(condsim2, col.regions=terrain.colors(50), main = "Conditional Simulation 2")

set.seed(501)
condsim3 = krige(V ~ 1, domain, grid, model = dirvgmodel.final, nmin=2, nmax=32, maxdist=150, nsim=1)

## drawing 1 GLS realisation of beta...
## [using conditional Gaussian simulation]

sgsim3.plot <- spplot(condsim3, col.regions=terrain.colors(50), main = "Conditional Simulation 3")

# N Conditional Simulations
par(mfrow=c(1, 2))
allcondsim = krige(V ~ 1, domain, grid, model = dirvgmodel.final, nmin=1, nmax=32, nsim=8)

## drawing 8 GLS realisations of beta...
## [using conditional Gaussian simulation]

spplot(allcondsim, col.regions=terrain.colors(50), scales=list(draw=T), p.layout=list("sp.points", domain, pch="+"), main = "Conditional Gaussian Simulations")

# Output
write.csv(allcondsim, file="CondSimulations_model.csv")

grid.arrange(ok.plot, sgsim1.plot, ncol=2)

19 Validation

Kriging Estimation must be validated through: 1. Global mean vs. declustered data, 2. Swath plots, 3. Visual sections, 4. GT-Curves (Smoothing)

Note: simulations must reproduce the histogram and the spatial correlation model (variogram).

20 Session Info

sessionInfo()

## R version 3.6.3 (2020-02-29)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] data.table_1.12.8  car_3.0-8          carData_3.0-4      kableExtra_1.1.0  
##  [5] gridExtra_2.3      dplyr_1.0.0        lattice_0.20-38    Amelia_1.7.6      
##  [9] Rcpp_1.0.5         summarytools_0.9.6 ggplot2_3.3.2      dismo_1.1-4       
## [13] raster_3.3-13      geoR_1.8-1         gstat_2.0-6        sp_1.4-2          
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2                 tidyr_1.1.0               
##  [3] jsonlite_1.7.0             viridisLite_0.3.0         
##  [5] highr_0.8                  pander_0.6.3              
##  [7] cellranger_1.1.0           yaml_2.2.1                
##  [9] pillar_1.4.6               backports_1.1.7           
## [11] quadprog_1.5-8             glue_1.4.1                
## [13] digest_0.6.25              pryr_0.1.4                
## [15] checkmate_2.0.0            rvest_0.3.6               
## [17] colorspace_1.4-1           htmltools_0.5.0           
## [19] plyr_1.8.6                 pkgconfig_2.0.3           
## [21] haven_2.3.1                magick_2.4.0              
## [23] purrr_0.3.4                scales_1.1.1              
## [25] intervals_0.15.2           webshot_0.5.2             
## [27] openxlsx_4.1.5             rio_0.5.16                
## [29] tibble_3.0.3               generics_0.0.2            
## [31] ellipsis_0.3.1             withr_2.2.0               
## [33] deldir_0.1-28              readxl_1.3.1              
## [35] magrittr_1.5               crayon_1.3.4              
## [37] evaluate_0.14              MASS_7.3-51.5             
## [39] forcats_0.5.0              xts_0.12-0                
## [41] xml2_1.3.2                 foreign_0.8-75            
## [43] RandomFieldsUtils_0.5.3    FNN_1.1.3                 
## [45] rapportools_1.0            tools_3.6.3               
## [47] hms_0.5.3                  lifecycle_0.2.0           
## [49] matrixStats_0.56.0         stringr_1.4.0             
## [51] munsell_0.5.0              zip_2.0.4                 
## [53] compiler_3.6.3             spacetime_1.2-3           
## [55] rlang_0.4.7                grid_3.6.3                
## [57] RandomFields_3.3.8         rstudioapi_0.11           
## [59] PerformanceAnalytics_2.0.4 tcltk_3.6.3               
## [61] base64enc_0.1-3            rmarkdown_2.3             
## [63] gtable_0.3.0               codetools_0.2-16          
## [65] abind_1.4-5                curl_4.3                  
## [67] R6_2.4.1                   splancs_2.01-40           
## [69] zoo_1.8-8                  lubridate_1.7.9           
## [71] knitr_1.29                 readr_1.3.1               
## [73] stringi_1.4.6              vctrs_0.3.2               
## [75] tidyselect_1.1.0           xfun_0.16

This concludes the report!

Variography, Kriging and Conditional Gaussian Simulation (Walker Lake DataSet)

Created By Julio C. SOLANO (Universidad Nacional de Colombia - Medellín)

July 18th, 2020