Spatial Variogram Analysis for Tree Diameter Data (booknotes)

1. Introduction to Variography

What is a Variogram?

The variogram (or semivariogram) is a fundamental tool in geostatistics that quantifies spatial autocorrelation. It measures how similarity between observations changes with distance.

The Semivariance Formula: \[\gamma(h) = \frac{1}{2N(h)} \sum_{i=1}^{N(h)} [Z(x_i) - Z(x_i + h)]^2\]

Where: - \(\gamma(h)\) = semivariance at lag distance \(h\) - \(N(h)\) = number of pairs at distance \(h\) - \(Z(x_i)\) = value at location \(x_i\) - \(Z(x_i + h)\) = value at location separated by distance \(h\)

Key Variogram Parameters

Parameter	Symbol	Description
Nugget	\(C_0\)	Y-intercept, measurement error + microscale variation
Partial Sill	\(C_1\)	Spatially structured variance
Sill	\(C_0 + C_1\)	Total variance (plateau)
Range	\(a\)	Distance where 63% of sill reached
Practical Range	\(3a\)	Distance where 95% of sill reached

The Exponential Model

\[\gamma(h) = C_0 + C_1(1 - e^{-h/a})\]

At h = 0: \(\gamma(0) = C_0\) (nugget)
At h = a: \(\gamma(a) = C_0 + 0.632C_1\) (63% of sill)
At h = 3a: \(\gamma(3a) = C_0 + 0.95C_1\) (95% of sill)
At h → ∞: \(\gamma(\infty) = C_0 + C_1\) (sill)

2. Load Required Libraries and Explore WEF.dat Data

library(spBayes)      # For WEF.dat data
library(classInt)     # For class intervals
library(RColorBrewer) # For color palettes
library(MBA)          # For spatial interpolation
library(fields)       # For image plotting
library(geoR)         # For variogram analysis
library(sp)           # For spatial data handling
library(gstat)        # For variogram modeling (more stable)
library(MASS)         # For mvrnorm
library(knitr)
# Optional: Color palette function
col.br <- function(n) {
  colorRampPalette(c("blue", "cyan", "yellow", "red"))(n)
}

# Load data from spBayes

data(WEF.dat)

# Clean data - remove rows with NA values
WEF.dat <- WEF.dat[!apply(WEF.dat[,c("East_m","North_m", "DBH_cm","Tree_height_m","ELEV_m")], 
                         1, function(x) any(is.na(x))),]

# View data structure
cat("Data dimensions:", dim(WEF.dat), "\n")

## Data dimensions: 1955 15

cat("Variable names:", names(WEF.dat), "\n\n")

## Variable names: Tree_ID Species DBH_cm Live_dead Stump North_m East_m ELEV_m Tree_height_m Crown_depth_m Mean_crown_radius_m_from_center Crown_radius_north_m_from_center Crown_radius_east_m_from_center Crown_radius_west_m_from_center Crown_radius_south_m_from_center

# Summary statistics
summary(WEF.dat[, c("DBH_cm", "Tree_height_m", "ELEV_m")])

##      DBH_cm       Tree_height_m       ELEV_m    
##  Min.   :  5.40   Min.   : 2.86   Min.   :1100  
##  1st Qu.: 14.35   1st Qu.:11.96   1st Qu.:1107  
##  Median : 23.20   Median :17.72   Median :1109  
##  Mean   : 36.77   Mean   :23.66   Mean   :1109  
##  3rd Qu.: 47.00   3rd Qu.:35.00   3rd Qu.:1113  
##  Max.   :207.70   Max.   :62.87   Max.   :1118

# Extract variables for analysis
coords <- as.matrix(WEF.dat[, c("East_m", "North_m")])
DBH <- WEF.dat$DBH_cm
Height <- WEF.dat$Tree_height_m
Elevation <- WEF.dat$ELEV_m

# Check if Species exists
has_species <- "Species" %in% names(WEF.dat)
if(has_species) {
  Species <- WEF.dat$Species
  cat("\nSpecies present:", unique(Species), "\n")
  cat("Species counts:\n")
  print(table(Species))
}

## 
## Species present: 1 6 4 3 2 
## Species counts:
## Species
##   DF   GF   NF   SF  UNK   WH 
##  290   32    2 1128    0  503

# First few rows
head(WEF.dat)

##     Tree_ID Species DBH_cm Live_dead Stump North_m East_m  ELEV_m Tree_height_m
## 171       9      DF  121.6         L         90.92 221.70 1113.12         54.90
## 172      10      DF  108.0         L         92.72 220.87 1113.40         55.96
## 173      11      DF  109.0         L         97.70 220.87 1113.20         49.06
## 174      12      WH   72.8         L         99.48 215.76 1113.07         44.44
## 175      17      WH   64.0         L        104.57 224.64 1113.34         47.10
## 176      21      DF  111.3         L         98.29 229.13 1113.06         54.35
##     Crown_depth_m Mean_crown_radius_m_from_center
## 171         31.70                            2.98
## 172         29.45                            1.73
## 173         18.28                            3.42
## 174         38.72                            4.44
## 175         33.91                            4.16
## 176         14.27                            2.78
##     Crown_radius_north_m_from_center Crown_radius_east_m_from_center
## 171                             1.41                            2.58
## 172                             1.47                            1.24
## 173                             5.40                            2.96
## 174                             4.40                            3.01
## 175                             7.39                            2.59
## 176                             2.01                            1.93
##     Crown_radius_west_m_from_center Crown_radius_south_m_from_center
## 171                            4.05                             3.90
## 172                            2.34                             1.88
## 173                            2.99                             2.33
## 174                            5.60                             4.73
## 175                            3.73                             2.94
## 176                            3.46                             3.71

4. Exploratory Data Analysis

# Set up plotting
par(mfrow = c(2, 2))

# DBH distribution
hist(DBH, 
     main = "DBH Distribution", 
     xlab = "DBH (cm)", 
     col = "lightblue",
     breaks = 20)

# Boxplot of DBH by species (if available)
if(has_species) {
  boxplot(DBH ~ Species, data = WEF.dat,
          main = "DBH by Species",
          xlab = "Species", 
          ylab = "DBH (cm)",
          col = brewer.pal(length(unique(Species)), "Set2"))
} else {
  boxplot(DBH, 
          main = "DBH Boxplot", 
          ylab = "DBH (cm)", 
          col = "lightblue")
}

# Spatial distribution of trees
plot(coords, 
     pch = 16, 
     cex = 0.8, 
     xlab = "Easting (m)", 
     ylab = "Northing (m)",
     main = "Tree Locations",
     col = rgb(0, 0, 1, 0.5))
title(sub = paste("n =", nrow(WEF.dat)))

# DBH vs Elevation
plot(Elevation, DBH,
     xlab = "Elevation (m)", 
     ylab = "DBH (cm)",
     main = "DBH vs Elevation",
     pch = 16, 
     col = rgb(0, 0, 1, 0.3))

# Add smoothing line if enough points
if(nrow(WEF.dat) > 10) {
  fit_elev <- loess(DBH ~ Elevation, data = WEF.dat)
  elev_order <- order(Elevation)
  lines(Elevation[elev_order], 
        fit_elev$fitted[elev_order], 
        col = "red", lwd = 2)
}

4. Visualize the Data

4.1 Spatial Map of DBH

# Create interpolation surface
x.res <- 100
y.res <- 100
surf <- mba.surf(cbind(coords, DBH), 
                 no.X = x.res, 
                 no.Y = y.res, 
                 h = 5, 
                 m = 2, 
                 extend = FALSE)$xyz.est

# Plot
image.plot(surf, 
           xaxs = "r", 
           yaxs = "r", 
           xlab = "Easting (m)", 
           ylab = "Northing (m)", 
           col = col.br(25),
           main = "DBH Spatial Distribution")

contour(surf, add = TRUE)

# Add tree locations
points(coords, pch = 16, cex = 0.5)

4.2 DBH by Species

boxplot(DBH ~ Species, data = WEF.dat,
        main = "DBH by Species",
        xlab = "Species",
        ylab = "DBH (cm)",
        col = c("lightblue", "lightgreen", "lightcoral"))

5. Step-by-Step Variogram Analysis

5.1 Compute the Empirical Variogram

# Maximum distance for variogram (25% of maximum pairwise distance)
max.dist <- 0.25 * max(dist(coords))
bins <- 50

# Compute variogram for original DBH data
vario.DBH <- variog(coords = coords, 
                    data = DBH, 
                    uvec = seq(0, max.dist, length = bins))

## variog: computing omnidirectional variogram

# View the results
str(vario.DBH)

## List of 20
##  $ u               : num [1:50] 0 1.8 3.6 5.4 7.2 ...
##  $ v               : num [1:50] 371 526 749 874 873 ...
##  $ n               : num [1:50] 127 1134 2241 3192 4225 ...
##  $ sd              : num [1:50] 913 1372 1775 1870 1938 ...
##  $ bins.lim        : num [1:51] 1.0e-12 9.0e-01 2.7 4.5 6.3 ...
##  $ ind.bin         : logi [1:50] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ var.mark        : num 1053
##  $ beta.ols        : num 36.8
##  $ output.type     : chr "bin"
##  $ max.dist        : num 89.1
##  $ estimator.type  : chr "classical"
##  $ n.data          : int 1955
##  $ lambda          : num 1
##  $ trend           : chr "cte"
##  $ pairs.min       : num 2
##  $ nugget.tolerance: num 1e-12
##  $ direction       : chr "omnidirectional"
##  $ tolerance       : chr "none"
##  $ uvec            : num [1:50] 0 1.8 3.6 5.4 7.2 ...
##  $ call            : language variog(coords = coords, data = DBH, uvec = seq(0, max.dist, length = bins))
##  - attr(*, "class")= chr "variogram"

What this does: 1. Calculates all pairwise distances between trees 2. Groups pairs into 50 distance bins 3. Computes semivariance for each bin: - Average squared difference / 2

5.2 Fit the Exponential Model

# Fit exponential model to original data
fit.DBH <- variofit(vario.DBH, 
                    ini.cov.pars = c(600, 200/-log(0.05)), 
                    cov.model = "exponential", 
                    minimisation.function = "nls", 
                    weights = "equal")

## variofit: covariance model used is exponential 
## variofit: weights used: equal 
## variofit: minimisation function used: nls

# View fitted parameters
print(fit.DBH)

## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
##    tausq  sigmasq      phi 
## 542.2718 455.9963  12.7095 
## Practical Range with cor=0.05 for asymptotic range: 38.07425
## 
## variofit: minimised sum of squares = 151227.4

Understanding the initial parameters: - 600: Initial guess for partial sill (\(C_1\)) - 200/-log(0.05): Initial guess for range (\(a\)) - At 200m, correlation drops to 5% - \(a = 200/2.996 = 66.7\)m

5.3 Plot the Variogram

plot(vario.DBH, 
     ylim = c(0, 1200), 
     main = "DBH Variogram",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")

# Add fitted model
lines(fit.DBH, col = "black", lwd = 2)

# Add reference lines
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)

# Add legend
legend("bottomright", 
       legend = c("Empirical", "Fitted", "Nugget", "Sill", "Practical Range"),
       col = c("black", "black", "blue", "green", "red"),
       lty = c(NA, 1, 2, 2, 2),
       pch = c(1, NA, NA, NA, NA))

What the lines mean: - Blue: Nugget (\(C_0\)) - Variation at zero distance - Green: Sill (\(C_0 + C_1\)) - Total variance - Red: Practical Range (\(3a\)) - Maximum spatial influence

6. Residual Analysis (Checking Model Adequacy)

6.1 Fit Linear Model and Extract Residuals

# Fit linear model (DBH ~ Species)
lm.DBH <- lm(DBH ~ Species, data = WEF.dat)
summary(lm.DBH)

## 
## Call:
## lm(formula = DBH ~ Species, data = WEF.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.423  -9.969  -3.561  10.924 118.277 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   89.423      1.303  68.629   <2e-16 ***
## SpeciesGF    -51.598      4.133 -12.483   <2e-16 ***
## SpeciesNF     -5.873     15.744  -0.373    0.709    
## SpeciesSF    -68.347      1.461 -46.784   <2e-16 ***
## SpeciesWH    -48.062      1.636 -29.377   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.19 on 1950 degrees of freedom
## Multiple R-squared:  0.5332, Adjusted R-squared:  0.5323 
## F-statistic: 556.9 on 4 and 1950 DF,  p-value: < 2.2e-16

# Extract residuals
DBH.resid <- resid(lm.DBH)

# View residuals
head(DBH.resid)

##      171      172      173      174      175      176 
## 32.17690 18.57690 19.57690 31.43877 22.63877 21.87690

Why residual analysis? - Residuals = What’s left unexplained by the model - If residuals show spatial patterns → Model is missing spatial predictors - If residuals are spatially random → Model is adequate

6.2 Compute Variogram of Residuals

# Compute variogram for residuals
vario.DBH.resid <- variog(coords = coords, 
                          data = DBH.resid, 
                          uvec = seq(0, max.dist, length = bins))

## variog: computing omnidirectional variogram

# Fit model to residuals
fit.DBH.resid <- variofit(vario.DBH.resid, 
                          ini.cov.pars = c(300, 200/-log(0.05)), 
                          cov.model = "exponential", 
                          minimisation.function = "nls", 
                          weights = "equal")

## variofit: covariance model used is exponential 
## variofit: weights used: equal 
## variofit: minimisation function used: nls

# View fitted parameters
print(fit.DBH.resid)

## variofit: model parameters estimated by OLS (ordinary least squares):
## covariance model is: exponential
## parameter estimates:
##    tausq  sigmasq      phi 
## 332.2388 107.6171  17.7166 
## Practical Range with cor=0.05 for asymptotic range: 53.07418
## 
## variofit: minimised sum of squares = 5371.325

6.3 Compare Original vs Residual Variograms

par(mfrow = c(1, 2))

# Original DBH
plot(vario.DBH, 
     ylim = c(0, 1200), 
     main = "Original DBH",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")
lines(fit.DBH, col = "black", lwd = 2)
abline(h = fit.DBH$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH$cov.pars[1] + fit.DBH$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH$cov.pars[2], col = "red", lwd = 2, lty = 2)

# Residuals
plot(vario.DBH.resid, 
     ylim = c(0, 500), 
     main = "Residuals",
     xlab = "Distance (m)",
     ylab = "Semivariance (cm²)")
lines(fit.DBH.resid, col = "black", lwd = 2)
abline(h = fit.DBH.resid$nugget, col = "blue", lwd = 2, lty = 2)
abline(h = fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget, col = "green", lwd = 2, lty = 2)
abline(v = -log(0.05) * fit.DBH.resid$cov.pars[2], col = "red", lwd = 2, lty = 2)

6.4 Quantitative Comparison

# Calculate key statistics
original_sill <- fit.DBH$cov.pars[1] + fit.DBH$nugget
residual_sill <- fit.DBH.resid$cov.pars[1] + fit.DBH.resid$nugget

# Percentage of spatial variation explained by species
species_explained <- (original_sill - residual_sill) / original_sill * 100

# Create comparison table
comparison <- data.frame(
  Parameter = c("Nugget (C₀)", 
                "Partial Sill (C₁)", 
                "Sill (C₀ + C₁)", 
                "Range (a)", 
                "Practical Range (3a)"),
  Original = c(round(fit.DBH$nugget, 2),
               round(fit.DBH$cov.pars[1], 2),
               round(original_sill, 2),
               round(fit.DBH$cov.pars[2], 2),
               round(-log(0.05) * fit.DBH$cov.pars[2], 2)),
  Residuals = c(round(fit.DBH.resid$nugget, 2),
                round(fit.DBH.resid$cov.pars[1], 2),
                round(residual_sill, 2),
                round(fit.DBH.resid$cov.pars[2], 2),
                round(-log(0.05) * fit.DBH.resid$cov.pars[2], 2))
)

print(comparison)

##              Parameter Original Residuals
## 1          Nugget (C₀)   542.27    332.24
## 2    Partial Sill (C₁)   456.00    107.62
## 3       Sill (C₀ + C₁)   998.27    439.86
## 4            Range (a)    12.71     17.72
## 5 Practical Range (3a)    38.07     53.07

# Interpretation
cat("\nSpecies explains", round(species_explained, 1), 
    "% of the spatial variation in DBH\n")

## 
## Species explains 55.9 % of the spatial variation in DBH

7. Interpretation Guide

7.1 What the Results Tell You

Detailed Interpretation of Variogram Results with Criteria
Component	Category	Criteria	Value
Species Effect	Strength	55.9% > 70%?	MODERATE
Species Effect	Interpretation	55.9% > 70%?	Species and environment both important
Species Effect	Action	55.9% > 70%?	Consider species + environment
Residual Structure	Strength	44.1% > 40%?	STRONG
Residual Structure	Interpretation	44.1% > 40%?	Model needs additional environmental predictors
Residual Structure	Action	44.1% > 40%?	Add environmental predictors; Consider spatial regression

7.2 Practical Recommendations

Recommendations Based on Variogram Analysis
Area	Category	Recommendation
Sampling Design	Independent Samples	Place plots at least 53 m apart
Sampling Design	Spatial Interpolation	Sample within 53 m for interpolation
Model Improvement	Predictors	Add soil nutrients, moisture, elevation
Model Improvement	Competition	Include competition indices
Model Improvement	Interactions	Test interaction terms (Species × Environment)
Spatial Methods	Prediction	Use kriging for spatial prediction
Spatial Methods	Advanced Method	Consider universal kriging with covariates
Spatial Methods	Diagnostic	Test for anisotropy (different ranges in different directions)

8. Advanced Topics

8.1 Directional Variograms (Anisotropy)

# Check for anisotropy in different directions
# Note: This requires the gstat package
library(gstat)

# Convert to SpatialPointsDataFrame
coordinates(WEF.dat) <- ~ X + Y

# Compute directional variograms
vario_directional <- variogram(DBH ~ 1, 
                               data = WEF.dat,
                               alpha = c(0, 45, 90, 135))

# Plot directional variograms
plot(vario_directional, 
     main = "Directional Variograms",
     xlab = "Distance (m)",
     ylab = "Semivariance")

8.2 Residual Maps

# Create residual surface
surf_resid <- mba.surf(cbind(coords, DBH.resid), 
                       no.X = x.res, 
                       no.Y = y.res, 
                       h = 5, 
                       m = 2, 
                       extend = FALSE)$xyz.est

# Plot residual map
image.plot(surf_resid, 
           xaxs = "r", 
           yaxs = "r", 
           xlab = "Easting (m)", 
           ylab = "Northing (m)", 
           col = colorRampPalette(c("blue", "white", "red"))(25),
           main = "Spatial Distribution of Residuals")

contour(surf_resid, add = TRUE)

# Add tree locations
points(coords, pch = 16, cex = 0.5)

8.3 Cross-Validation

# Kriging cross-validation
# Requires geoR package

# Prepare data for kriging
geo_data <- as.geodata(data.frame(coords, data = DBH))

# Perform kriging with cross-validation
cv <- krige.cv(geo_data, 
               model = fit.DBH,
               nsim = 100)

# View cross-validation results
summary(cv)

# Plot observed vs predicted
plot(DBH, cv$predicted,
     xlab = "Observed DBH (cm)",
     ylab = "Predicted DBH (cm)",
     main = "Kriging Cross-Validation")
abline(0, 1, col = "red", lwd = 2)

9. Complete Workflow Summary

Detailed Spatial Analysis Workflow
Step	Action
1. EXPLORATORY ANALYSIS	Visualize data on map
	Check species differences
	Identify spatial patterns
2. VARIAGRAM ANALYSIS	Compute empirical variogram
	Fit theoretical model (exponential)
	Extract parameters (nugget, sill, range)
3. MODELING	Fit linear model (DBH ~ Species)
	Extract residuals
	Compute residual variogram
4. MODEL EVALUATION	Compare original vs residual variograms
	Calculate species contribution to spatial pattern
	Assess remaining spatial structure
5. DECISION MAKING	If residuals show structure → Add predictors
	If residuals are random → Model is adequate
	Use parameters for sampling design

Appendix: Quick Reference

R Functions Used

Function	Package	Purpose
`variog()`	geoR	Compute empirical variogram
`variofit()`	geoR	Fit theoretical variogram model
`mba.surf()`	MBA	Spatial interpolation
`image.plot()`	fields	Plot spatial surfaces
`lm()`	stats	Linear model
`resid()`	stats	Extract residuals

Variogram Parameters Summary

Variogram Parameters Quick Reference
Symbol	Parameter	Value	Description	Key.Insight
C₀	Nugget	542.27	Measurement error + microscale variation	Variation at distances < min sampling interval
C₁	Partial Sill	456.00	Spatially structured variance	Variation explained by distance
C₀ + C₁	Sill	998.27	Total variance in the data	Plateau where semivariance stabilizes
a	Range	12.71	Distance to 63% of sill	Rate of spatial decay
3a	Practical Range	38.07	Distance to 95% of sill	Maximum distance of spatial influence