1. Introduction to Variography

What is a Variogram?

The variogram (or semivariogram) is a fundamental tool in geostatistics that quantifies spatial autocorrelation. It measures how similarity between observations changes with distance.

The Semivariance Formula: \[\gamma(h) = \frac{1}{2N(h)} \sum_{i=1}^{N(h)} [Z(x_i) - Z(x_i + h)]^2\]

Where: - \(\gamma(h)\) = semivariance at lag distance \(h\) - \(N(h)\) = number of pairs at distance \(h\) - \(Z(x_i)\) = value at location \(x_i\) - \(Z(x_i + h)\) = value at location separated by distance \(h\)

Key Variogram Parameters

Parameter Symbol Description
Nugget \(C_0\) Y-intercept, measurement error + microscale variation
Partial Sill \(C_1\) Spatially structured variance
Sill \(C_0 + C_1\) Total variance (plateau)
Range \(a\) Distance where 63% of sill reached
Practical Range \(3a\) Distance where 95% of sill reached

The Exponential Model

\[\gamma(h) = C_0 + C_1(1 - e^{-h/a})\]

  • At h = 0: \(\gamma(0) = C_0\) (nugget)
  • At h = a: \(\gamma(a) = C_0 + 0.632C_1\) (63% of sill)
  • At h = 3a: \(\gamma(3a) = C_0 + 0.95C_1\) (95% of sill)
  • At h → ∞: \(\gamma(\infty) = C_0 + C_1\) (sill)

2. Load Required Libraries and Explore WEF.dat Data

library(spBayes)      # For WEF.dat data
library(classInt)     # For class intervals
library(RColorBrewer) # For color palettes
library(MBA)          # For spatial interpolation
library(fields)       # For image plotting
library(geoR)         # For variogram analysis
library(sp)           # For spatial data handling
library(gstat)        # For variogram modeling (more stable)
library(MASS)         # For mvrnorm
library(knitr)
# Optional: Color palette function
col.br <- function(n) {
  colorRampPalette(c("blue", "cyan", "yellow", "red"))(n)
}

# Load data from spBayes

data(WEF.dat)

# Clean data - remove rows with NA values
WEF.dat <- WEF.dat[!apply(WEF.dat[,c("East_m","North_m", "DBH_cm","Tree_height_m","ELEV_m")], 
                         1, function(x) any(is.na(x))),]

# View data structure
cat("Data dimensions:", dim(WEF.dat), "\n")
## Data dimensions: 1955 15
cat("Variable names:", names(WEF.dat), "\n\n")
## Variable names: Tree_ID Species DBH_cm Live_dead Stump North_m East_m ELEV_m Tree_height_m Crown_depth_m Mean_crown_radius_m_from_center Crown_radius_north_m_from_center Crown_radius_east_m_from_center Crown_radius_west_m_from_center Crown_radius_south_m_from_center
# Summary statistics
summary(WEF.dat[, c("DBH_cm", "Tree_height_m", "ELEV_m")])
##      DBH_cm       Tree_height_m       ELEV_m    
##  Min.   :  5.40   Min.   : 2.86   Min.   :1100  
##  1st Qu.: 14.35   1st Qu.:11.96   1st Qu.:1107  
##  Median : 23.20   Median :17.72   Median :1109  
##  Mean   : 36.77   Mean   :23.66   Mean   :1109  
##  3rd Qu.: 47.00   3rd Qu.:35.00   3rd Qu.:1113  
##  Max.   :207.70   Max.   :62.87   Max.   :1118
# Extract variables for analysis
coords <- as.matrix(WEF.dat[, c("East_m", "North_m")])
DBH <- WEF.dat$DBH_cm
Height <- WEF.dat$Tree_height_m
Elevation <- WEF.dat$ELEV_m

# Check if Species exists
has_species <- "Species" %in% names(WEF.dat)
if(has_species) {
  Species <- WEF.dat$Species
  cat("\nSpecies present:", unique(Species), "\n")
  cat("Species counts:\n")
  print(table(Species))
}
## 
## Species present: 1 6 4 3 2 
## Species counts:
## Species
##   DF   GF   NF   SF  UNK   WH 
##  290   32    2 1128    0  503
# First few rows
head(WEF.dat)
##     Tree_ID Species DBH_cm Live_dead Stump North_m East_m  ELEV_m Tree_height_m
## 171       9      DF  121.6         L         90.92 221.70 1113.12         54.90
## 172      10      DF  108.0         L         92.72 220.87 1113.40         55.96
## 173      11      DF  109.0         L         97.70 220.87 1113.20         49.06
## 174      12      WH   72.8         L         99.48 215.76 1113.07         44.44
## 175      17      WH   64.0         L        104.57 224.64 1113.34         47.10
## 176      21      DF  111.3         L         98.29 229.13 1113.06         54.35
##     Crown_depth_m Mean_crown_radius_m_from_center
## 171         31.70                            2.98
## 172         29.45                            1.73
## 173         18.28                            3.42
## 174         38.72                            4.44
## 175         33.91                            4.16
## 176         14.27                            2.78
##     Crown_radius_north_m_from_center Crown_radius_east_m_from_center
## 171                             1.41                            2.58
## 172                             1.47                            1.24
## 173                             5.40                            2.96
## 174                             4.40                            3.01
## 175                             7.39                            2.59
## 176                             2.01                            1.93
##     Crown_radius_west_m_from_center Crown_radius_south_m_from_center
## 171                            4.05                             3.90
## 172                            2.34                             1.88
## 173                            2.99                             2.33
## 174                            5.60                             4.73
## 175                            3.73                             2.94
## 176                            3.46                             3.71

4. Exploratory Data Analysis

# Set up plotting
par(mfrow = c(2, 2))

# DBH distribution
hist(DBH, 
     main = "DBH Distribution", 
     xlab = "DBH (cm)", 
     col = "lightblue",
     breaks = 20)

# Boxplot of DBH by species (if available)
if(has_species) {
  boxplot(DBH ~ Species, data = WEF.dat,
          main = "DBH by Species",
          xlab = "Species", 
          ylab = "DBH (cm)",
          col = brewer.pal(length(unique(Species)), "Set2"))
} else {
  boxplot(DBH, 
          main = "DBH Boxplot", 
          ylab = "DBH (cm)", 
          col = "lightblue")
}

# Spatial distribution of trees
plot(coords, 
     pch = 16, 
     cex = 0.8, 
     xlab = "Easting (m)", 
     ylab = "Northing (m)",
     main = "Tree Locations",
     col = rgb(0, 0, 1, 0.5))
title(sub = paste("n =", nrow(WEF.dat)))

# DBH vs Elevation
plot(Elevation, DBH,
     xlab = "Elevation (m)", 
     ylab = "DBH (cm)",
     main = "DBH vs Elevation",
     pch = 16, 
     col = rgb(0, 0, 1, 0.3))

# Add smoothing line if enough points
if(nrow(WEF.dat) > 10) {
  fit_elev <- loess(DBH ~ Elevation, data = WEF.dat)
  elev_order <- order(Elevation)
  lines(Elevation[elev_order], 
        fit_elev$fitted[elev_order], 
        col = "red", lwd = 2)
}


4. Visualize the Data

4.1 Spatial Map of DBH

# Create interpolation surface
x.res <- 100
y.res <- 100
surf <- mba.surf(cbind(coords, DBH), 
                 no.X = x.res, 
                 no.Y = y.res, 
                 h = 5, 
                 m = 2, 
                 extend = FALSE)$xyz.est

# Plot
image.plot(surf, 
           xaxs = "r", 
           yaxs = "r", 
           xlab = "Easting (m)", 
           ylab = "Northing (m)", 
           col = col.br(25),
           main = "DBH Spatial Distribution")

contour(surf, add = TRUE)

# Add tree locations
points(coords, pch = 16, cex = 0.5)

4.2 DBH by Species

boxplot(DBH ~ Species, data = WEF.dat,
        main = "DBH by Species",
        xlab = "Species",
        ylab = "DBH (cm)",
        col = c("lightblue", "lightgreen", "lightcoral"))


5. Step-by-Step Variogram Analysis

5.1 Compute the Empirical Variogram

# Maximum distance for variogram (25% of maximum pairwise distance)
max.dist <- 0.25 * max(dist(coords))
bins <- 50

# Compute variogram for original DBH data
vario.DBH <- variog(coords = coords, 
                    data = DBH, 
                    uvec = seq(0, max.dist, length = bins))
## variog: computing omnidirectional variogram
# View the results
str(vario.DBH)
## List of 20
##  $ u               : num [1:50] 0 1.8 3.6 5.4 7.2 ...
##  $ v               : num [1:50] 371 526 749 874 873 ...
##  $ n               : num [1:50] 127 1134 2241 3192 4225 ...
##  $ sd              : num [1:50] 913 1372 1775 1870 1938 ...
##  $ bins.lim        : num [1:51] 1.0e-12 9.0e-01 2.7 4.5 6.3 ...
##  $ ind.bin         : logi [1:50] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ var.mark        : num 1053
##  $ beta.ols        : num 36.8
##  $ output.type     : chr "bin"
##  $ max.dist        : num 89.1
##  $ estimator.type  : chr "classical"
##  $ n.data          : int 1955
##  $ lambda          : num 1
##  $ trend           : chr "cte"
##  $ pairs.min       : num 2
##  $ nugget.tolerance: num 1e-12
##  $ direction       : chr "omnidirectional"
##  $ tolerance       : chr "none"
##  $ uvec            : num [1:50] 0 1.8 3.6 5.4 7.2 ...
##  $ call            : language variog(coords = coords, data = DBH, uvec = seq(0, max.dist, length = bins))
##  - attr(*, "class")= chr "variogram"

What this does: 1. Calculates all pairwise distances between trees 2. Groups pairs into 50 distance bins 3. Computes semivariance for each bin: - Average squared difference / 2

5.2 Fit the Exponential Model

# Fit exponential model to original data
fit.DBH <- variofit(vario.DBH, 
                    ini.cov.pars = c(600, 200/-log(0.05)), 
                    cov.model = "exponential", 
                    minimisation.function = "nls", 
                    weights = "equal")
## variofit: covariance model used is exponential 
## variofit: weights used: equal 
## variofit: minimisation function used: nls
# View fitted parameters
print(fit.DBH)
## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
##    tausq  sigmasq      phi 
## 542.2718 455.9963  12.7095 
## Practical Range with cor=0.05 for asymptotic range: 38.07425
## 
## variofit: minimised sum of squares = 151227.4

Understanding the initial parameters: - 600: Initial guess for partial sill (\(C_1\)) - 200/-log(0.05): Initial guess for range (\(a\)) - At 200m, correlation drops to 5% - \(a = 200/2.996 = 66.7\)m

5.3 Plot the Variogram

plot(vario.DBH, 
     ylim = c(0, 1200), 
     main = "DBH Variogram",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")

# Add fitted model
lines(fit.DBH, col = "black", lwd = 2)

# Add reference lines
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)

# Add legend
legend("bottomright", 
       legend = c("Empirical", "Fitted", "Nugget", "Sill", "Practical Range"),
       col = c("black", "black", "blue", "green", "red"),
       lty = c(NA, 1, 2, 2, 2),
       pch = c(1, NA, NA, NA, NA))

What the lines mean: - Blue: Nugget (\(C_0\)) - Variation at zero distance - Green: Sill (\(C_0 + C_1\)) - Total variance - Red: Practical Range (\(3a\)) - Maximum spatial influence


6. Residual Analysis (Checking Model Adequacy)

6.1 Fit Linear Model and Extract Residuals

# Fit linear model (DBH ~ Species)
lm.DBH <- lm(DBH ~ Species, data = WEF.dat)
summary(lm.DBH)
## 
## Call:
## lm(formula = DBH ~ Species, data = WEF.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.423  -9.969  -3.561  10.924 118.277 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   89.423      1.303  68.629   <2e-16 ***
## SpeciesGF    -51.598      4.133 -12.483   <2e-16 ***
## SpeciesNF     -5.873     15.744  -0.373    0.709    
## SpeciesSF    -68.347      1.461 -46.784   <2e-16 ***
## SpeciesWH    -48.062      1.636 -29.377   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.19 on 1950 degrees of freedom
## Multiple R-squared:  0.5332, Adjusted R-squared:  0.5323 
## F-statistic: 556.9 on 4 and 1950 DF,  p-value: < 2.2e-16
# Extract residuals
DBH.resid <- resid(lm.DBH)

# View residuals
head(DBH.resid)
##      171      172      173      174      175      176 
## 32.17690 18.57690 19.57690 31.43877 22.63877 21.87690

Why residual analysis? - Residuals = What’s left unexplained by the model - If residuals show spatial patterns → Model is missing spatial predictors - If residuals are spatially random → Model is adequate

6.2 Compute Variogram of Residuals

# Compute variogram for residuals
vario.DBH.resid <- variog(coords = coords, 
                          data = DBH.resid, 
                          uvec = seq(0, max.dist, length = bins))
## variog: computing omnidirectional variogram
# Fit model to residuals
fit.DBH.resid <- variofit(vario.DBH.resid, 
                          ini.cov.pars = c(300, 200/-log(0.05)), 
                          cov.model = "exponential", 
                          minimisation.function = "nls", 
                          weights = "equal")
## variofit: covariance model used is exponential 
## variofit: weights used: equal 
## variofit: minimisation function used: nls
# View fitted parameters
print(fit.DBH.resid)
## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
##    tausq  sigmasq      phi 
## 332.2388 107.6171  17.7166 
## Practical Range with cor=0.05 for asymptotic range: 53.07418
## 
## variofit: minimised sum of squares = 5371.325

6.3 Compare Original vs Residual Variograms

par(mfrow = c(1, 2))

# Original DBH
plot(vario.DBH, 
     ylim = c(0, 1200), 
     main = "Original DBH",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")
lines(fit.DBH, col = "black", lwd = 2)
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)

# Residuals
plot(vario.DBH.resid, 
     ylim = c(0, 500), 
     main = "Residuals",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")
lines(fit.DBH.resid, col = "black", lwd = 2)
abline(h = fit.DBH.resid$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH.resid$cov.pars[2], col = "red", lwd = 2, lty = 2)

6.4 Quantitative Comparison

# Calculate key statistics
original_sill <- fit.DBH$cov.pars[1] + fit.DBH$nugget
residual_sill <- fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget

# Percentage of spatial variation explained by species
species_explained <- (original_sill - residual_sill) / original_sill * 100

# Create comparison table
comparison <- data.frame(
  Parameter = c("Nugget (C₀)", 
                "Partial Sill (C₁)", 
                "Sill (C₀ + C₁)", 
                "Range (a)", 
                "Practical Range (3a)"),
  Original = c(round(fit.DBH$nugget, 2),
               round(fit.DBH$cov.pars[1], 2),
               round(original_sill, 2),
               round(fit.DBH$cov.pars[2], 2),
               round(-log(0.05) * fit.DBH$cov.pars[2], 2)),
  Residuals = c(round(fit.DBH.resid$nugget, 2),
                round(fit.DBH.resid$cov.pars[1], 2),
                round(residual_sill, 2),
                round(fit.DBH.resid$cov.pars[2], 2),
                round(-log(0.05) * fit.DBH.resid$cov.pars[2], 2))
)

print(comparison)
##              Parameter Original Residuals
## 1          Nugget (C₀)   542.27    332.24
## 2    Partial Sill (C₁)   456.00    107.62
## 3       Sill (C₀ + C₁)   998.27    439.86
## 4            Range (a)    12.71     17.72
## 5 Practical Range (3a)    38.07     53.07
# Interpretation
cat("\nSpecies explains", round(species_explained, 1), 
    "% of the spatial variation in DBH\n")
## 
## Species explains 55.9 % of the spatial variation in DBH

7. Interpretation Guide

7.1 What the Results Tell You

Detailed Interpretation of Variogram Results with Criteria
Component Category Criteria Value
Species Effect Strength 55.9% > 70%? MODERATE
Species Effect Interpretation 55.9% > 70%? Species and environment both important
Species Effect Action 55.9% > 70%? Consider species + environment
Residual Structure Strength 44.1% > 40%? STRONG
Residual Structure Interpretation 44.1% > 40%? Model needs additional environmental predictors
Residual Structure Action 44.1% > 40%? Add environmental predictors; Consider spatial regression

7.2 Practical Recommendations

Recommendations Based on Variogram Analysis
Area Category Recommendation
Sampling Design Independent Samples Place plots at least 53 m apart
Sampling Design Spatial Interpolation Sample within 53 m for interpolation
Model Improvement Predictors Add soil nutrients, moisture, elevation
Model Improvement Competition Include competition indices
Model Improvement Interactions Test interaction terms (Species × Environment)
Spatial Methods Prediction Use kriging for spatial prediction
Spatial Methods Advanced Method Consider universal kriging with covariates
Spatial Methods Diagnostic Test for anisotropy (different ranges in different directions)

8. Advanced Topics

8.1 Directional Variograms (Anisotropy)

# Check for anisotropy in different directions
# Note: This requires the gstat package
library(gstat)

# Convert to SpatialPointsDataFrame
coordinates(WEF.dat) <- ~ X + Y

# Compute directional variograms
vario_directional <- variogram(DBH ~ 1, 
                               data = WEF.dat,
                               alpha = c(0, 45, 90, 135))

# Plot directional variograms
plot(vario_directional, 
     main = "Directional Variograms",
     xlab = "Distance (m)",
     ylab = "Semivariance")

8.2 Residual Maps

# Create residual surface
surf_resid <- mba.surf(cbind(coords, DBH.resid), 
                       no.X = x.res, 
                       no.Y = y.res, 
                       h = 5, 
                       m = 2, 
                       extend = FALSE)$xyz.est

# Plot residual map
image.plot(surf_resid, 
           xaxs = "r", 
           yaxs = "r", 
           xlab = "Easting (m)", 
           ylab = "Northing (m)", 
           col = colorRampPalette(c("blue", "white", "red"))(25),
           main = "Spatial Distribution of Residuals")

contour(surf_resid, add = TRUE)

# Add tree locations
points(coords, pch = 16, cex = 0.5)

8.3 Cross-Validation

# Kriging cross-validation
# Requires geoR package

# Prepare data for kriging
geo_data <- as.geodata(data.frame(coords, data = DBH))

# Perform kriging with cross-validation
cv <- krige.cv(geo_data, 
               model = fit.DBH,
               nsim = 100)

# View cross-validation results
summary(cv)

# Plot observed vs predicted
plot(DBH, cv$predicted,
     xlab = "Observed DBH (cm)",
     ylab = "Predicted DBH (cm)",
     main = "Kriging Cross-Validation")
abline(0, 1, col = "red", lwd = 2)

9. Complete Workflow Summary

Detailed Spatial Analysis Workflow
Step Action
1. EXPLORATORY ANALYSIS Visualize data on map
Check species differences
Identify spatial patterns
2. VARIAGRAM ANALYSIS Compute empirical variogram
Fit theoretical model (exponential)
Extract parameters (nugget, sill, range)
3. MODELING Fit linear model (DBH ~ Species)
Extract residuals
Compute residual variogram
4. MODEL EVALUATION Compare original vs residual variograms
Calculate species contribution to spatial pattern
Assess remaining spatial structure
5. DECISION MAKING If residuals show structure → Add predictors
If residuals are random → Model is adequate
Use parameters for sampling design


Appendix: Quick Reference

R Functions Used

Function Package Purpose
variog() geoR Compute empirical variogram
variofit() geoR Fit theoretical variogram model
mba.surf() MBA Spatial interpolation
image.plot() fields Plot spatial surfaces
lm() stats Linear model
resid() stats Extract residuals

Variogram Parameters Summary

Variogram Parameters Quick Reference
Symbol Parameter Value Description Key.Insight
C₀ Nugget 542.27 Measurement error + microscale variation Variation at distances < min sampling interval
C₁ Partial Sill 456.00 Spatially structured variance Variation explained by distance
C₀ + C₁ Sill 998.27 Total variance in the data Plateau where semivariance stabilizes
a Range 12.71 Distance to 63% of sill Rate of spatial decay
3a Practical Range 38.07 Distance to 95% of sill Maximum distance of spatial influence