Introduction

Spatial point analysis is a statistical technique used to study the patterns formed by individual points within a given space. It allows researchers to investigate the underlying spatial structures that may govern the distribution of these points, whether they are dispersed, clustered, or randomly distributed across the study area.

This form of analysis is particularly useful in ecology to understand the distribution of organisms. The spatstat package in R provides a comprehensive toolkit for performing such analyses, and it includes a range of functions to simulate point patterns and apply various tests and models to them.

This guide aims to equip you with the fundamental steps to conduct spatial point analysis using spatstat, starting from simulating point data to applying point pattern tests. This guide will give you a clearer understanding of how to implement these techniques and interpret their results for your own spatial data.

Set up and data simulation

Install package

To analyse spatial data in R we can use the spatstat package. Install it with install.packages("spatstat") if necessary.

library(spatstat)

Simulate some data

Before illustrating data spatial point analysis, I will use R to simulate some data with different properties so you can see how the analyses looks under various circumstances.

Spatial data used in spatstat are held in a special object called a ppp, or “planar point pattern” object. This is a way of representing points on a 2-dimensional plane (surface) where the coordinates of the points, and the extent of the area where the points could be (the “window”) are both included. I think the window can be defined as a polygon (i.e. any shape, such as the shape of an island for example.)

I here make some spatial points (x and y coordinates), and put them into a ppp object using the function ppp.

First I’m simulating random uniform distribution. This is called “complete spatial random” (CSR) distribution.

# Define the coordinates of points
x <- runif(50, -10,10)
y <- runif(50, -10,10)

# Create a point pattern object from the x-y data
# The window argument tells R what the window looks like
csr_data <- ppp(x, y, window=owin(c(-10,10),c(-10,10)))

Next I use a combination of a uniform distribution and normal distribution to define a clustered set of points. This works by using the uniform distribution to define a center for n sets of points, then using a normal distribution centered on each point to define k points for each cluster.

set.seed(123) # Set a random seed for reproducibility

# Define the number of clusters and points per cluster
num_clusters <- 10
points_per_cluster <- 5

# Initialize a data frame to store the points
points <- data.frame(x = numeric(0), y = numeric(0), cluster = integer(0))

# Generate clusters
for (i in 1:num_clusters) {
  # Define the center of the cluster
  center_x <- runif(1, min = -10, max = 10)
  center_y <- runif(1, min = -10, max = 10)
  
  # Generate points around the center using rnorm
  x_values <- rnorm(points_per_cluster, mean = center_x, sd = 1)
  y_values <- rnorm(points_per_cluster, mean = center_y, sd = 1)
  
  # Combine the points into a data frame
  cluster_points <- data.frame(x = x_values, y = y_values, cluster = rep(i, points_per_cluster))
  
  # Add to the overall points data frame
  points <- rbind(points, cluster_points)
}

# You can also plot the points to visualize the clusters
plot(points$x, points$y, col=points$cluster, pch=19)

I now put those points into a ppp object.

clustered_data <- ppp(points$x, points$y, window=owin(c(-11,11),c(-11,11)))

Point analysis

So now we have two sets of data points to work with and explore methods: clustered and spatially random.

Nearest neighbour analysis

# Calculate nearest neighbor distances
nn_distances <- nndist(clustered_data)
# Summary statistics of the distances
summary(nn_distances)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.06536 0.37920 0.72583 0.84883 1.12650 2.24112

# Histogram of nearest neighbor distances
hist(nn_distances, breaks=20, main="Histogram of Nearest Neighbor Distances")

What does this show?

nndist calculates the nearest neighbor distance for each point, which is the distance to its closest neighbor. The summary provides basic statistics about this distribution: the mean, median, minimum, and maximum distances. A small mean distance may suggest clustering; a large one may indicate regular spacing.

The histogram visually represents the frequency of different nearest neighbor distances. Clustering patterns typically show a peak at shorter distances, while more regular patterns have a peak at larger distances.

Let’s compare this to the CSR data.

# Calculate nearest neighbor distances
nn_distances <- nndist(csr_data)
# Summary statistics of the distances
summary(nn_distances)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2237  1.2828  1.5469  1.6382  2.0589  3.4515

# Histogram of nearest neighbor distances
hist(nn_distances, breaks=20, main="Histogram of Nearest Neighbor Distances")

Not super-easy to tall apart in my opinion!

Clark and Evans Test

The Clark and Evans test compares the observed mean nearest neighbor distance to the expected mean distance under a random distribution.

A result significantly less than 1 suggests clustering, as points are closer together than expected by chance. A result significantly greater than 1 indicates regular spacing, suggesting a repulsive interaction between points.

You could compare this statistic for different islands/areas.

# Perform the Clark-Evans test
clarkevans.test(csr_data)

## 
##  Clark-Evans test
##  Donnelly correction
##  Z-test
## 
## data:  csr_data
## R = 1.088, p-value = 0.2051
## alternative hypothesis: two-sided

The CSR data have an R value that is not significantly different from 1. Indicating that there is no clustering.

Let’s check the clustered data…

# Perform the Clark-Evans test
clarkevans.test(clustered_data)

## 
##  Clark-Evans test
##  Donnelly correction
##  Z-test
## 
## data:  clustered_data
## R = 0.51247, p-value = 2.188e-12
## alternative hypothesis: two-sided

The R value is significantly different (smaller) than 1, indicating clustering (as we expected, since we defined that the data are clustered!)

Nearest Neighbour Function (G-function)

The G-function is the cumulative distribution function of the nearest neighbor distances. It shows for any distance r, the proportion of points whose nearest neighbor is within distance r. The shape of the curve provides insights into the spatial process: a G-function above the diagonal line suggests clustering; below it indicates dispersion. You could examine how this varies among islands/areas.

# Estimate the G-function
G <- Gest(clustered_data)
# Plot the G-function
plot(G, main="Nearest Neighbor Distance (G-function)")

# Estimate the G-function
G <- Gest(csr_data)
# Plot the G-function
plot(G, main="Nearest Neighbor Distance (G-function)")

Quadrat test

quadrat.test checks for Complete Spatial Randomness (CSR) by dividing the study area into quadrats and comparing the variance to the mean of the count across quadrats. The nx and ny arguments define the number of quadrats used If points are randomly distributed, the count in each quadrat should be similar. Significant deviation from this indicates a departure from CSR, suggesting either clustering or regularity. Again you could check how this statistic varies across areas/islands.

# Quadrat analysis for CSR
quadrat.test(csr_data, nx=5, ny=5)

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  csr_data
## X2 = 14, df = 24, p-value = 0.1067
## alternative hypothesis: two.sided
## 
## Quadrats: 5 by 5 grid of tiles

Now for the clustered data:

# Quadrat analysis for CSR
quadrat.test(clustered_data, nx=5, ny=5)

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  clustered_data
## X2 = 80, df = 24, p-value = 1.217e-07
## alternative hypothesis: two.sided
## 
## Quadrats: 5 by 5 grid of tiles

Summary

This guide has walked you through some of the basic analysis that is possible for spatial point patterns using the spatstat package. The key to insightful spatial analysis lies in the careful interpretation of statistical outputs in the light of your study’s specific context. Whether your data (or different parts of it) indicate randomness, clustering, or regularity, should help you understand the underlying spatial processes at play,

Refereces

https://spatstat.github.io/spatstat/

Additionally, the book ‘Spatial Point Patterns: Methodology and Applications with R’ by Baddeley, Rubak, and Turner offers an extensive treatment of the subject. For a broader understanding of spatial analysis in ecology, ‘Spatial Analysis: A Guide for Ecologists’ by Fortin and Dale provides a detailed overview. They may be available in the library, or you could get them via inter-library loan.

Spatial point analysis

Owen Jones

2023-11-03