Spatial point analysis is a statistical technique used to study the patterns formed by individual points within a given space. It allows researchers to investigate the underlying spatial structures that may govern the distribution of these points, whether they are dispersed, clustered, or randomly distributed across the study area.
This form of analysis is particularly useful in ecology to understand
the distribution of organisms. The spatstat package in R
provides a comprehensive toolkit for performing such analyses, and it
includes a range of functions to simulate point patterns and apply
various tests and models to them.
This guide aims to equip you with the fundamental steps to conduct
spatial point analysis using spatstat, starting from
simulating point data to applying point pattern tests. This guide will
give you a clearer understanding of how to implement these techniques
and interpret their results for your own spatial data.
To analyse spatial data in R we can use the spatstat
package. Install it with install.packages("spatstat") if
necessary.
library(spatstat)
Before illustrating data spatial point analysis, I will use R to simulate some data with different properties so you can see how the analyses looks under various circumstances.
Spatial data used in spatstat are held in a special
object called a ppp, or “planar point pattern” object. This
is a way of representing points on a 2-dimensional plane (surface) where
the coordinates of the points, and the extent of the area where the
points could be (the “window”) are both included. I think the
window can be defined as a polygon (i.e. any shape, such as the shape of
an island for example.)
I here make some spatial points (x and y coordinates), and put them
into a ppp object using the function ppp.
First I’m simulating random uniform distribution. This is called “complete spatial random” (CSR) distribution.
# Define the coordinates of points
x <- runif(50, -10,10)
y <- runif(50, -10,10)
# Create a point pattern object from the x-y data
# The window argument tells R what the window looks like
csr_data <- ppp(x, y, window=owin(c(-10,10),c(-10,10)))
Next I use a combination of a uniform distribution and normal
distribution to define a clustered set of points. This works by using
the uniform distribution to define a center for n sets of
points, then using a normal distribution centered on each point to
define k points for each cluster.
set.seed(123) # Set a random seed for reproducibility
# Define the number of clusters and points per cluster
num_clusters <- 10
points_per_cluster <- 5
# Initialize a data frame to store the points
points <- data.frame(x = numeric(0), y = numeric(0), cluster = integer(0))
# Generate clusters
for (i in 1:num_clusters) {
# Define the center of the cluster
center_x <- runif(1, min = -10, max = 10)
center_y <- runif(1, min = -10, max = 10)
# Generate points around the center using rnorm
x_values <- rnorm(points_per_cluster, mean = center_x, sd = 1)
y_values <- rnorm(points_per_cluster, mean = center_y, sd = 1)
# Combine the points into a data frame
cluster_points <- data.frame(x = x_values, y = y_values, cluster = rep(i, points_per_cluster))
# Add to the overall points data frame
points <- rbind(points, cluster_points)
}
# You can also plot the points to visualize the clusters
plot(points$x, points$y, col=points$cluster, pch=19)
I now put those points into a ppp object.
clustered_data <- ppp(points$x, points$y, window=owin(c(-11,11),c(-11,11)))
So now we have two sets of data points to work with and explore methods: clustered and spatially random.
# Calculate nearest neighbor distances
nn_distances <- nndist(clustered_data)
# Summary statistics of the distances
summary(nn_distances)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06536 0.37920 0.72583 0.84883 1.12650 2.24112
# Histogram of nearest neighbor distances
hist(nn_distances, breaks=20, main="Histogram of Nearest Neighbor Distances")
What does this show?
nndist calculates the nearest neighbor distance for each
point, which is the distance to its closest neighbor. The summary
provides basic statistics about this distribution: the mean, median,
minimum, and maximum distances. A small mean distance may suggest
clustering; a large one may indicate regular spacing.
The histogram visually represents the frequency of different nearest neighbor distances. Clustering patterns typically show a peak at shorter distances, while more regular patterns have a peak at larger distances.
Let’s compare this to the CSR data.
# Calculate nearest neighbor distances
nn_distances <- nndist(csr_data)
# Summary statistics of the distances
summary(nn_distances)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2237 1.2828 1.5469 1.6382 2.0589 3.4515
# Histogram of nearest neighbor distances
hist(nn_distances, breaks=20, main="Histogram of Nearest Neighbor Distances")
Not super-easy to tall apart in my opinion!
The Clark and Evans test compares the observed mean nearest neighbor distance to the expected mean distance under a random distribution.
A result significantly less than 1 suggests clustering, as points are closer together than expected by chance. A result significantly greater than 1 indicates regular spacing, suggesting a repulsive interaction between points.
You could compare this statistic for different islands/areas.
# Perform the Clark-Evans test
clarkevans.test(csr_data)
##
## Clark-Evans test
## Donnelly correction
## Z-test
##
## data: csr_data
## R = 1.088, p-value = 0.2051
## alternative hypothesis: two-sided
The CSR data have an R value that is not significantly different from 1. Indicating that there is no clustering.
Let’s check the clustered data…
# Perform the Clark-Evans test
clarkevans.test(clustered_data)
##
## Clark-Evans test
## Donnelly correction
## Z-test
##
## data: clustered_data
## R = 0.51247, p-value = 2.188e-12
## alternative hypothesis: two-sided
The R value is significantly different (smaller) than 1, indicating clustering (as we expected, since we defined that the data are clustered!)
The G-function is the cumulative distribution function of the nearest neighbor distances. It shows for any distance r, the proportion of points whose nearest neighbor is within distance r. The shape of the curve provides insights into the spatial process: a G-function above the diagonal line suggests clustering; below it indicates dispersion. You could examine how this varies among islands/areas.
# Estimate the G-function
G <- Gest(clustered_data)
# Plot the G-function
plot(G, main="Nearest Neighbor Distance (G-function)")
# Estimate the G-function
G <- Gest(csr_data)
# Plot the G-function
plot(G, main="Nearest Neighbor Distance (G-function)")
quadrat.test checks for Complete Spatial Randomness
(CSR) by dividing the study area into quadrats and comparing the
variance to the mean of the count across quadrats. The nx
and ny arguments define the number of quadrats used If
points are randomly distributed, the count in each quadrat should be
similar. Significant deviation from this indicates a departure from CSR,
suggesting either clustering or regularity. Again you could check how
this statistic varies across areas/islands.
# Quadrat analysis for CSR
quadrat.test(csr_data, nx=5, ny=5)
## Warning: Some expected counts are small; chi^2 approximation may be inaccurate
##
## Chi-squared test of CSR using quadrat counts
##
## data: csr_data
## X2 = 14, df = 24, p-value = 0.1067
## alternative hypothesis: two.sided
##
## Quadrats: 5 by 5 grid of tiles
Now for the clustered data:
# Quadrat analysis for CSR
quadrat.test(clustered_data, nx=5, ny=5)
## Warning: Some expected counts are small; chi^2 approximation may be inaccurate
##
## Chi-squared test of CSR using quadrat counts
##
## data: clustered_data
## X2 = 80, df = 24, p-value = 1.217e-07
## alternative hypothesis: two.sided
##
## Quadrats: 5 by 5 grid of tiles
This guide has walked you through some of the basic analysis that is
possible for spatial point patterns using the spatstat
package. The key to insightful spatial analysis lies in the careful
interpretation of statistical outputs in the light of your study’s
specific context. Whether your data (or different parts of it) indicate
randomness, clustering, or regularity, should help you understand the
underlying spatial processes at play,
https://spatstat.github.io/spatstat/
Additionally, the book ‘Spatial Point Patterns: Methodology and Applications with R’ by Baddeley, Rubak, and Turner offers an extensive treatment of the subject. For a broader understanding of spatial analysis in ecology, ‘Spatial Analysis: A Guide for Ecologists’ by Fortin and Dale provides a detailed overview. They may be available in the library, or you could get them via inter-library loan.