The variogram (or semivariogram) is a fundamental tool in geostatistics that quantifies spatial autocorrelation. It measures how similarity between observations changes with distance.
The Semivariance Formula: \[\gamma(h) = \frac{1}{2N(h)} \sum_{i=1}^{N(h)} [Z(x_i) - Z(x_i + h)]^2\]
Where: - \(\gamma(h)\) = semivariance at lag distance \(h\) - \(N(h)\) = number of pairs at distance \(h\) - \(Z(x_i)\) = value at location \(x_i\) - \(Z(x_i + h)\) = value at location separated by distance \(h\)
| Parameter | Symbol | Description |
|---|---|---|
| Nugget | \(C_0\) | Y-intercept, measurement error + microscale variation |
| Partial Sill | \(C_1\) | Spatially structured variance |
| Sill | \(C_0 + C_1\) | Total variance (plateau) |
| Range | \(a\) | Distance where 63% of sill reached |
| Practical Range | \(3a\) | Distance where 95% of sill reached |
\[\gamma(h) = C_0 + C_1(1 - e^{-h/a})\]
library(spBayes) # For WEF.dat data
library(classInt) # For class intervals
library(RColorBrewer) # For color palettes
library(MBA) # For spatial interpolation
library(fields) # For image plotting
library(geoR) # For variogram analysis
library(sp) # For spatial data handling
library(gstat) # For variogram modeling (more stable)
library(MASS) # For mvrnorm
library(knitr)
# Optional: Color palette function
col.br <- function(n) {
colorRampPalette(c("blue", "cyan", "yellow", "red"))(n)
}
# Load data from spBayes
data(WEF.dat)
# Clean data - remove rows with NA values
WEF.dat <- WEF.dat[!apply(WEF.dat[,c("East_m","North_m", "DBH_cm","Tree_height_m","ELEV_m")],
1, function(x) any(is.na(x))),]
# View data structure
cat("Data dimensions:", dim(WEF.dat), "\n")
## Data dimensions: 1955 15
cat("Variable names:", names(WEF.dat), "\n\n")
## Variable names: Tree_ID Species DBH_cm Live_dead Stump North_m East_m ELEV_m Tree_height_m Crown_depth_m Mean_crown_radius_m_from_center Crown_radius_north_m_from_center Crown_radius_east_m_from_center Crown_radius_west_m_from_center Crown_radius_south_m_from_center
# Summary statistics
summary(WEF.dat[, c("DBH_cm", "Tree_height_m", "ELEV_m")])
## DBH_cm Tree_height_m ELEV_m
## Min. : 5.40 Min. : 2.86 Min. :1100
## 1st Qu.: 14.35 1st Qu.:11.96 1st Qu.:1107
## Median : 23.20 Median :17.72 Median :1109
## Mean : 36.77 Mean :23.66 Mean :1109
## 3rd Qu.: 47.00 3rd Qu.:35.00 3rd Qu.:1113
## Max. :207.70 Max. :62.87 Max. :1118
# Extract variables for analysis
coords <- as.matrix(WEF.dat[, c("East_m", "North_m")])
DBH <- WEF.dat$DBH_cm
Height <- WEF.dat$Tree_height_m
Elevation <- WEF.dat$ELEV_m
# Check if Species exists
has_species <- "Species" %in% names(WEF.dat)
if(has_species) {
Species <- WEF.dat$Species
cat("\nSpecies present:", unique(Species), "\n")
cat("Species counts:\n")
print(table(Species))
}
##
## Species present: 1 6 4 3 2
## Species counts:
## Species
## DF GF NF SF UNK WH
## 290 32 2 1128 0 503
# First few rows
head(WEF.dat)
## Tree_ID Species DBH_cm Live_dead Stump North_m East_m ELEV_m Tree_height_m
## 171 9 DF 121.6 L 90.92 221.70 1113.12 54.90
## 172 10 DF 108.0 L 92.72 220.87 1113.40 55.96
## 173 11 DF 109.0 L 97.70 220.87 1113.20 49.06
## 174 12 WH 72.8 L 99.48 215.76 1113.07 44.44
## 175 17 WH 64.0 L 104.57 224.64 1113.34 47.10
## 176 21 DF 111.3 L 98.29 229.13 1113.06 54.35
## Crown_depth_m Mean_crown_radius_m_from_center
## 171 31.70 2.98
## 172 29.45 1.73
## 173 18.28 3.42
## 174 38.72 4.44
## 175 33.91 4.16
## 176 14.27 2.78
## Crown_radius_north_m_from_center Crown_radius_east_m_from_center
## 171 1.41 2.58
## 172 1.47 1.24
## 173 5.40 2.96
## 174 4.40 3.01
## 175 7.39 2.59
## 176 2.01 1.93
## Crown_radius_west_m_from_center Crown_radius_south_m_from_center
## 171 4.05 3.90
## 172 2.34 1.88
## 173 2.99 2.33
## 174 5.60 4.73
## 175 3.73 2.94
## 176 3.46 3.71
# Set up plotting
par(mfrow = c(2, 2))
# DBH distribution
hist(DBH,
main = "DBH Distribution",
xlab = "DBH (cm)",
col = "lightblue",
breaks = 20)
# Boxplot of DBH by species (if available)
if(has_species) {
boxplot(DBH ~ Species, data = WEF.dat,
main = "DBH by Species",
xlab = "Species",
ylab = "DBH (cm)",
col = brewer.pal(length(unique(Species)), "Set2"))
} else {
boxplot(DBH,
main = "DBH Boxplot",
ylab = "DBH (cm)",
col = "lightblue")
}
# Spatial distribution of trees
plot(coords,
pch = 16,
cex = 0.8,
xlab = "Easting (m)",
ylab = "Northing (m)",
main = "Tree Locations",
col = rgb(0, 0, 1, 0.5))
title(sub = paste("n =", nrow(WEF.dat)))
# DBH vs Elevation
plot(Elevation, DBH,
xlab = "Elevation (m)",
ylab = "DBH (cm)",
main = "DBH vs Elevation",
pch = 16,
col = rgb(0, 0, 1, 0.3))
# Add smoothing line if enough points
if(nrow(WEF.dat) > 10) {
fit_elev <- loess(DBH ~ Elevation, data = WEF.dat)
elev_order <- order(Elevation)
lines(Elevation[elev_order],
fit_elev$fitted[elev_order],
col = "red", lwd = 2)
}
# Create interpolation surface
x.res <- 100
y.res <- 100
surf <- mba.surf(cbind(coords, DBH),
no.X = x.res,
no.Y = y.res,
h = 5,
m = 2,
extend = FALSE)$xyz.est
# Plot
image.plot(surf,
xaxs = "r",
yaxs = "r",
xlab = "Easting (m)",
ylab = "Northing (m)",
col = col.br(25),
main = "DBH Spatial Distribution")
contour(surf, add = TRUE)
# Add tree locations
points(coords, pch = 16, cex = 0.5)
boxplot(DBH ~ Species, data = WEF.dat,
main = "DBH by Species",
xlab = "Species",
ylab = "DBH (cm)",
col = c("lightblue", "lightgreen", "lightcoral"))
# Maximum distance for variogram (25% of maximum pairwise distance)
max.dist <- 0.25 * max(dist(coords))
bins <- 50
# Compute variogram for original DBH data
vario.DBH <- variog(coords = coords,
data = DBH,
uvec = seq(0, max.dist, length = bins))
## variog: computing omnidirectional variogram
# View the results
str(vario.DBH)
## List of 20
## $ u : num [1:50] 0 1.8 3.6 5.4 7.2 ...
## $ v : num [1:50] 371 526 749 874 873 ...
## $ n : num [1:50] 127 1134 2241 3192 4225 ...
## $ sd : num [1:50] 913 1372 1775 1870 1938 ...
## $ bins.lim : num [1:51] 1.0e-12 9.0e-01 2.7 4.5 6.3 ...
## $ ind.bin : logi [1:50] TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ var.mark : num 1053
## $ beta.ols : num 36.8
## $ output.type : chr "bin"
## $ max.dist : num 89.1
## $ estimator.type : chr "classical"
## $ n.data : int 1955
## $ lambda : num 1
## $ trend : chr "cte"
## $ pairs.min : num 2
## $ nugget.tolerance: num 1e-12
## $ direction : chr "omnidirectional"
## $ tolerance : chr "none"
## $ uvec : num [1:50] 0 1.8 3.6 5.4 7.2 ...
## $ call : language variog(coords = coords, data = DBH, uvec = seq(0, max.dist, length = bins))
## - attr(*, "class")= chr "variogram"
What this does: 1. Calculates all pairwise distances between trees 2. Groups pairs into 50 distance bins 3. Computes semivariance for each bin: - Average squared difference / 2
# Fit exponential model to original data
fit.DBH <- variofit(vario.DBH,
ini.cov.pars = c(600, 200/-log(0.05)),
cov.model = "exponential",
minimisation.function = "nls",
weights = "equal")
## variofit: covariance model used is exponential
## variofit: weights used: equal
## variofit: minimisation function used: nls
# View fitted parameters
print(fit.DBH)
## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
## tausq sigmasq phi
## 542.2718 455.9963 12.7095
## Practical Range with cor=0.05 for asymptotic range: 38.07425
##
## variofit: minimised sum of squares = 151227.4
Understanding the initial parameters: -
600: Initial guess for partial sill (\(C_1\)) - 200/-log(0.05):
Initial guess for range (\(a\)) - At
200m, correlation drops to 5% - \(a =
200/2.996 = 66.7\)m
plot(vario.DBH,
ylim = c(0, 1200),
main = "DBH Variogram",
xlab = "Distance (m)",
ylab = "Semivariance (cm²)")
# Add fitted model
lines(fit.DBH, col = "black", lwd = 2)
# Add reference lines
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)
# Add legend
legend("bottomright",
legend = c("Empirical", "Fitted", "Nugget", "Sill", "Practical Range"),
col = c("black", "black", "blue", "green", "red"),
lty = c(NA, 1, 2, 2, 2),
pch = c(1, NA, NA, NA, NA))
What the lines mean: - Blue: Nugget (\(C_0\)) - Variation at zero distance - Green: Sill (\(C_0 + C_1\)) - Total variance - Red: Practical Range (\(3a\)) - Maximum spatial influence
# Fit linear model (DBH ~ Species)
lm.DBH <- lm(DBH ~ Species, data = WEF.dat)
summary(lm.DBH)
##
## Call:
## lm(formula = DBH ~ Species, data = WEF.dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.423 -9.969 -3.561 10.924 118.277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89.423 1.303 68.629 <2e-16 ***
## SpeciesGF -51.598 4.133 -12.483 <2e-16 ***
## SpeciesNF -5.873 15.744 -0.373 0.709
## SpeciesSF -68.347 1.461 -46.784 <2e-16 ***
## SpeciesWH -48.062 1.636 -29.377 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.19 on 1950 degrees of freedom
## Multiple R-squared: 0.5332, Adjusted R-squared: 0.5323
## F-statistic: 556.9 on 4 and 1950 DF, p-value: < 2.2e-16
# Extract residuals
DBH.resid <- resid(lm.DBH)
# View residuals
head(DBH.resid)
## 171 172 173 174 175 176
## 32.17690 18.57690 19.57690 31.43877 22.63877 21.87690
Why residual analysis? - Residuals = What’s left unexplained by the model - If residuals show spatial patterns → Model is missing spatial predictors - If residuals are spatially random → Model is adequate
# Compute variogram for residuals
vario.DBH.resid <- variog(coords = coords,
data = DBH.resid,
uvec = seq(0, max.dist, length = bins))
## variog: computing omnidirectional variogram
# Fit model to residuals
fit.DBH.resid <- variofit(vario.DBH.resid,
ini.cov.pars = c(300, 200/-log(0.05)),
cov.model = "exponential",
minimisation.function = "nls",
weights = "equal")
## variofit: covariance model used is exponential
## variofit: weights used: equal
## variofit: minimisation function used: nls
# View fitted parameters
print(fit.DBH.resid)
## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
## tausq sigmasq phi
## 332.2388 107.6171 17.7166
## Practical Range with cor=0.05 for asymptotic range: 53.07418
##
## variofit: minimised sum of squares = 5371.325
par(mfrow = c(1, 2))
# Original DBH
plot(vario.DBH,
ylim = c(0, 1200),
main = "Original DBH",
xlab = "Distance (m)",
ylab = "Semivariance (cm²)")
lines(fit.DBH, col = "black", lwd = 2)
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)
# Residuals
plot(vario.DBH.resid,
ylim = c(0, 500),
main = "Residuals",
xlab = "Distance (m)",
ylab = "Semivariance (cm²)")
lines(fit.DBH.resid, col = "black", lwd = 2)
abline(h = fit.DBH.resid$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH.resid$cov.pars[2], col = "red", lwd = 2, lty = 2)
# Calculate key statistics
original_sill <- fit.DBH$cov.pars[1] + fit.DBH$nugget
residual_sill <- fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget
# Percentage of spatial variation explained by species
species_explained <- (original_sill - residual_sill) / original_sill * 100
# Create comparison table
comparison <- data.frame(
Parameter = c("Nugget (C₀)",
"Partial Sill (C₁)",
"Sill (C₀ + C₁)",
"Range (a)",
"Practical Range (3a)"),
Original = c(round(fit.DBH$nugget, 2),
round(fit.DBH$cov.pars[1], 2),
round(original_sill, 2),
round(fit.DBH$cov.pars[2], 2),
round(-log(0.05) * fit.DBH$cov.pars[2], 2)),
Residuals = c(round(fit.DBH.resid$nugget, 2),
round(fit.DBH.resid$cov.pars[1], 2),
round(residual_sill, 2),
round(fit.DBH.resid$cov.pars[2], 2),
round(-log(0.05) * fit.DBH.resid$cov.pars[2], 2))
)
print(comparison)
## Parameter Original Residuals
## 1 Nugget (C₀) 542.27 332.24
## 2 Partial Sill (C₁) 456.00 107.62
## 3 Sill (C₀ + C₁) 998.27 439.86
## 4 Range (a) 12.71 17.72
## 5 Practical Range (3a) 38.07 53.07
# Interpretation
cat("\nSpecies explains", round(species_explained, 1),
"% of the spatial variation in DBH\n")
##
## Species explains 55.9 % of the spatial variation in DBH
| Component | Category | Criteria | Value |
|---|---|---|---|
| Species Effect | Strength | 55.9% > 70%? | MODERATE |
| Species Effect | Interpretation | 55.9% > 70%? | Species and environment both important |
| Species Effect | Action | 55.9% > 70%? | Consider species + environment |
| Residual Structure | Strength | 44.1% > 40%? | STRONG |
| Residual Structure | Interpretation | 44.1% > 40%? | Model needs additional environmental predictors |
| Residual Structure | Action | 44.1% > 40%? | Add environmental predictors; Consider spatial regression |
| Area | Category | Recommendation |
|---|---|---|
| Sampling Design | Independent Samples | Place plots at least 53 m apart |
| Sampling Design | Spatial Interpolation | Sample within 53 m for interpolation |
| Model Improvement | Predictors | Add soil nutrients, moisture, elevation |
| Model Improvement | Competition | Include competition indices |
| Model Improvement | Interactions | Test interaction terms (Species × Environment) |
| Spatial Methods | Prediction | Use kriging for spatial prediction |
| Spatial Methods | Advanced Method | Consider universal kriging with covariates |
| Spatial Methods | Diagnostic | Test for anisotropy (different ranges in different directions) |
# Check for anisotropy in different directions
# Note: This requires the gstat package
library(gstat)
# Convert to SpatialPointsDataFrame
coordinates(WEF.dat) <- ~ X + Y
# Compute directional variograms
vario_directional <- variogram(DBH ~ 1,
data = WEF.dat,
alpha = c(0, 45, 90, 135))
# Plot directional variograms
plot(vario_directional,
main = "Directional Variograms",
xlab = "Distance (m)",
ylab = "Semivariance")
# Create residual surface
surf_resid <- mba.surf(cbind(coords, DBH.resid),
no.X = x.res,
no.Y = y.res,
h = 5,
m = 2,
extend = FALSE)$xyz.est
# Plot residual map
image.plot(surf_resid,
xaxs = "r",
yaxs = "r",
xlab = "Easting (m)",
ylab = "Northing (m)",
col = colorRampPalette(c("blue", "white", "red"))(25),
main = "Spatial Distribution of Residuals")
contour(surf_resid, add = TRUE)
# Add tree locations
points(coords, pch = 16, cex = 0.5)
# Kriging cross-validation
# Requires geoR package
# Prepare data for kriging
geo_data <- as.geodata(data.frame(coords, data = DBH))
# Perform kriging with cross-validation
cv <- krige.cv(geo_data,
model = fit.DBH,
nsim = 100)
# View cross-validation results
summary(cv)
# Plot observed vs predicted
plot(DBH, cv$predicted,
xlab = "Observed DBH (cm)",
ylab = "Predicted DBH (cm)",
main = "Kriging Cross-Validation")
abline(0, 1, col = "red", lwd = 2)
| Step | Action |
|---|---|
| 1. EXPLORATORY ANALYSIS | Visualize data on map |
| Check species differences | |
| Identify spatial patterns | |
| 2. VARIAGRAM ANALYSIS | Compute empirical variogram |
| Fit theoretical model (exponential) | |
| Extract parameters (nugget, sill, range) | |
| 3. MODELING | Fit linear model (DBH ~ Species) |
| Extract residuals | |
| Compute residual variogram | |
| 4. MODEL EVALUATION | Compare original vs residual variograms |
| Calculate species contribution to spatial pattern | |
| Assess remaining spatial structure | |
| 5. DECISION MAKING | If residuals show structure → Add predictors |
| If residuals are random → Model is adequate | |
| Use parameters for sampling design |
| Function | Package | Purpose |
|---|---|---|
variog() |
geoR | Compute empirical variogram |
variofit() |
geoR | Fit theoretical variogram model |
mba.surf() |
MBA | Spatial interpolation |
image.plot() |
fields | Plot spatial surfaces |
lm() |
stats | Linear model |
resid() |
stats | Extract residuals |
| Symbol | Parameter | Value | Description | Key.Insight |
|---|---|---|---|---|
| C₀ | Nugget | 542.27 | Measurement error + microscale variation | Variation at distances < min sampling interval |
| C₁ | Partial Sill | 456.00 | Spatially structured variance | Variation explained by distance |
| C₀ + C₁ | Sill | 998.27 | Total variance in the data | Plateau where semivariance stabilizes |
| a | Range | 12.71 | Distance to 63% of sill | Rate of spatial decay |
| 3a | Practical Range | 38.07 | Distance to 95% of sill | Maximum distance of spatial influence |