Lecture 3: Introductory statistics using R

Glenna Nightingale

2020-09-03

Point pattern characteristics

Point patterns are datasets generated from the locations of objects in space. Examples of point patterns include the location of trees in a forest, the location of ants’ nests in a field, the seating locations of patrons in a restaurant/students in a classroom and the locations of the residences of biscuit monsters (as in the course dataset).

Point patterns can be described as possessing first and second order characteristics. First order characterisics refer to the intensity of point in the point pattern and this is analogous to the concept of “mean” in conventional statistics. Second order characteristics refer to the proximity of points in the point pattern and this is analogous to the concept of dispersion or spread in conventional statistics.

I’ve simulated some points to illustrate this concepts further. In Figure 1 below, the residences of the biscuit monsters appear to be distributed in a clusters; in fact, in later years, these clustered residences were observed to be located near biscuit factories!

The R code used to produce Figure 1 is provided below.

ggplot(thebiscuits, aes(x=Long,y=Lat,color=key))+
  theme(text = element_text(size=20))+geom_point(size=3)+
  scale_color_manual(values=c("#3182bd","#de2d26"))+
  ggtitle("Sightings of biscuit monster residences (and biscuit factories)")+
  xlab("Longitude")+ylab("Latitude")+
  facet_wrap(. ~time,ncol=1,scales="free_y")

There are various metrics used to describe point patterns. For this lecture, I’ll focus on two metrics.

Ripley’s K function

Ripley’s K function is useful for describing the distribution of points in a point pattern at various spatial scales. The estimation $\hat K(r)$ describes that distribution of a given point pattern $\mathbf(x)$ at an inter-point distance $r$. When the esimation is computed at various vlaues of r, changes in the distribution can be easily detected. The point pattern could be clustered at small values of r and random at larger values.

One assumption for this function is that the generating process is stationary (i.e. spatially homogeneous).

The formula used for estimation is given as:

\[ \hat K(r) = \frac{A}{n(n-1)} \sum _{i} \sum_{j} I(d_{ij} \leq r) e_{ij}\]

where A, n , i ,j, $d_{ij}$, $e_{ij}$, and I represent the area of the observation window, the number of points in the point pattern, the focal point, the neighbouring point, the distance between the focal point and neighbouring points, and edge correction weight and the Indicator function.

Frequently, the estimation of Ripley’s K function is compared to that expected for a “reference” point pattern (generated from a Poisson process) which exhibits complete spatial randomness (CSR). As shown in the Figure 2, the solid black line represents the estimation of the function for the data, the red dotted line, that for a point pattern exhibiting CSR, and the grey bands indicate the simulation envelope generated by simulating 100 point patterns from a Poisson process. The data used in Figure 2 is based on the location of residences for biscuit monsters in 1980 (prior to the biscuit factories in the area).

The R code used for the Figure 2 is provided below.

library(spatstat)
library(ggplot2)
library(ggforce)

thebiscuits2 = thebiscuits[thebiscuits$years==1980,]
regular = ppp(thebiscuits2$Long,thebiscuits2$Lat,window=owin(c(range(thebiscuits2$Long)),c(range(thebiscuits2$Lat))))
plot(envelope(regular,Kest,nsim=100),ylim=c(0,0.02),
     lwd=3,main="Figure 2: Ripley's K function for pattern of biscuit monster residences",legend=TRUE)

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

If the black solid line is above the simulation envelope, the observed point pattern is classified as being more clustered at that value of r than expected for a point pattern exhibiting CSR. If below, the point pattern is classified as being regular. If the black line is within the envelope, it is concluded that there is no evidence to suggest that the point pattern does not have a random distribution of points.

Pair correlation coefficient

This function denoted, g(r), summarizes the dependence (or “association”) between points at a given distance r. The function g(r) can be expressed as:

\[ g(r) = \frac{K'(r)}{2\pi r}\] where $K'(r)$ represents the derivative of Ripley’s K function, K(r) with respect to r (Stoyan and Penttinen 2000, Baddeley et.al, 2007).

Using the “biscuit” analogy, we can say that g(r) provides the perspective of a biscuit monster positioned at a biscuit monster residence in the study area.

For a point pattern exhibiting CSR, $g(r)=1$ for all $r$. When $g(r)>1$, this suggests that the interpoint distance $r$ between points occurs more frequently than would be expected under CSR, indicating clustering of points; using our analogy, clustering of biscuit monster residences. Conversely, regularity (a regular distribution of points) is suggested if $g(r)<1$. If $g(r)=0$, the suggests there are no points within the specified distance $r$, indicating that this interpoint distance $r$ is a “hard core” radius.

Figure 3 shows the plot of $g(r)$ for the biscuit monster residences in 1980 in the study location. From this plot it is evident that the solid line (denoting g(r)) lies above the simulation envelope (reference for CSR) between $r=0.02 units to r=0.045 units). The red dotted line on the plot denote the reference line for a pattern denoting CSR, and the grey lines, the reference envelope (for 100 instantiations of a pattern with randomly distributed points).

library(spatstat)
library(ggplot2)
library(ggforce)


plot(envelope(regular,pcf,nsim=100),ylim=c(0,5),
     lwd=3,main="Figure 3: Pair correlation function for pattern of biscuit monster residences",legend=TRUE)

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

Recipes - to simulate/upload your own “biscuit” data for analysis