library(spatstat)
library(ggplot2)
library(ggforce)

Overview

In this short script I’ve illustrate how point process statistics can be applied to the spatial considerations in social interaction in the context of the Covid-19 pandemic. For the simulated datasets, the observation window for each dataset is a 10x10 square representing a 10x10 m2 area. For each simulated individual, let xi be associated with a zone of influence Zi, such that Z(xi,r) is a disc of radius r = 1 metre, centered at the x-y location of each individual xi and that Z(xi,r)={a2:.

The area of the union of the zones of influence for each individual in a given observation window may be expressed as: \left|U_{\mathbf{x},r}\right|, in an inclusion-exclusion style (Nightingale et al., 2019,Picard et al., 2009). This is expressed concisely as: \begin{equation} \left|U_{\mathbf{x},r}\right| =\sum^{n(\mathbf{x})}_{i=1} \left|B(x_{i},r)\right| - \sum_{i<j} \left|B(x_{i},r) \cap B(x_{j},r)\right| + ... + (-1)^{n(\mathbf{x})+1}\left|\bigcap^{n(\mathbf{x})}_{i=1} B(x_{i},r)\right|.\end{equation}

Illustration of spatial distancing

I’ve simulated two point patterns, of the location of individuals (\alpha), and the location of cough particles in a cough cloud (\beta). I’ve simulated the points for \alpha from a Uniform distribution, and \beta, from a Normal distribution. In the figure below the two point patterns superimposed on each other in the observation window. The point pattern for the cough cloud can be described as a marked point pattern since each point has a specific marker; that is, large or small sized.

bapp = read.csv("bap.csv")
cough = read.csv("C:/Users/User/Desktop/May 2020/covid/cough.csv")
cough$particle = c(rep("large",100),rep("aerosol1",100),rep("aerosol2",100))
cough$colouring = c(rep("red",100),rep("purple",100),rep("green",100))
cough$shaping = c(rep(16,100),rep(15,100),rep(15,100))
coughlarge = cough[cough$particle=="large",]
cougha1 = cough[cough$particle=="aerosol1",]
cougha2 = cough[cough$particle=="aerosol2",]

ggplot(bapp,mapping = aes(bapp$x,bapp$y))+
  xlab("x")+ylab("y")+
  ggtitle("Figure 1: Zones of influence and cough clouds")+
  geom_circle(aes(x0=bapp$x,y0=bapp$y,r=1),inherit.aes = FALSE,
              col="goldenrod",fill = "goldenrod",alpha=0.2)+
  theme(text = element_text(size=40),legend.position="none")+coord_fixed(ratio = 1) +
  geom_point(coughlarge,mapping=aes(coughlarge$xcoords,coughlarge$ycoords),color="red",size=4)+
  geom_point(cougha1,mapping=aes(cougha1$xcoords,cougha1$ycoords),color="purple",size=2)+
  geom_point(cougha2,mapping=aes(cougha2$xcoords,cougha2$ycoords),color="darkgreen",size=2)+
  scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))+
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))+
  geom_point(col="black",pch=16,size=6)

Illustration of the area of non-overlap for three spatial configurations

Random spatial configuration

For this point pattern, I’ve simulated the points from a Poisson process using the rpois function in the R package spatstat. I’ve calculated the area of non-overalp between the zones of influence for each point using the function areaLoss in the R package spatstat.

random = read.csv("C:/Users/User/Desktop/May 2020/covid/random.csv")
ggplot(random,mapping = aes(random$x,random$y))+
  xlab("x")+ylab("y")+
  ggtitle("Figure 2: Random point pattern and zones of influence")+
  geom_circle(aes(x0=random$x,y0=random$y,r=1),inherit.aes = FALSE,
              fill = "red",alpha=0.5)+
  theme(text = element_text(size=40))+coord_fixed(ratio = 1) +
  geom_point(col="brown4",pch=16,size=6)+
scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))+
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))

random = read.csv("random.csv")
random = ppp(random$x,random$y,window=owin(c(0,10),c(0,10)))
randoma = areaLoss(random,1)

Area of non-overlap (m^{2})

## [1] 33.99106

Ripley’s K function

Ripley’s K function is useful for describing the distribution of points in a point pattern at various spatial scales. The estimation $ K(r)$ describes that distribution of a given point pattern \mathbf(x) at an inter-point distance r. When the esimation is computed at various vlaues of r, changes in the distribution can be easily detected. The point pattern could be clustered at small values of r and random at larger values.

One assumption for this function is that the generating process is stationary (i.e. spatially homogeneous).

The formula used for estimation is given as:

\hat K(r) = \frac{A}{n(n-1)} \sum _{i} \sum_{j} I(d_{ij} \leq r) e_{ij}

where A, n , i ,j, d_{ij}, e_{ij}, and I represent the area of the observation window, the number of points in the point pattern, the focal point, the neighbouring point, the distance between the focal point and neighbouring points, and edge correction weight and the Indicator function.

Frequently, the estimation of Ripley’s K function is compared to that expected for a “reference” point pattern (generated from a Poisson process) which exhibits complete spatial randomness (CSR). As shown in the figure below, the solid black line represents the estimation of the function for the data, the red dotted line, that for a point pattern exhibiting CSR, and the grey bands indicate the simulation envelope generated by simulating 100 point patterns from a Poisson process.

If the black solid line is above the simulation envelope, the observed point pattern is classified as being more clustered at that value of r than expected for a point pattern exhibiting CSR. If below, the point pattern is classified as being regular. If the black line is within the envelope, it is concluded that there is no evidence to suggest that the point pattern does not have a random distribution of points.

In the figure below, the solid black line lies within the simulation envelope; a strong indication that the observed point pattern has randomly distributed points. The distribution of points in that point pattern is not significantly different from that generated from a Poisson process.

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

Regular spatial configuration

For this point pattern, the code for its construction is provided below.

Area of non-overlap(m^{2})

x = c(rep(1,5),rep(3,5),rep(5,5),rep(7,5),rep(9,5))
y = c(rep(c(1,3,5,7,9),5))
regular = ppp(x,y,window=owin(c(0,10),c(0,10)))
regula = areaLoss(regular,1)
sum(regula)
## [1] 78.53982
regular = data.frame(regular)
ggplot(regular,mapping = aes(regular$x,regular$y))+
  xlab("x")+ylab("y")+
  ggtitle("Figure 3: Regular point pattern and zones of influence")+
  geom_circle(aes(x0=regular$x,y0=regular$y,r=1),inherit.aes = FALSE,
              fill = "darkseagreen",alpha=0.5)+
  theme(text = element_text(size=40))+coord_fixed(ratio = 1) +
  geom_point(col="darkgreen",pch=16,size=6)+
scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))+
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))

Ripley’s K function

In the figure below, the empirical K function lies predominantly under the simulation envelope; a strong indication that the points in the observed point pattern exhibit a more regular/inhibited/constrained distribution than that expected for a instance from a Poisson process.

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

Clustered spatial configuration

The clustered point pattern used was simulated using the rMatClust function in spatstat.

clustered = read.csv("C:/Users/User/Desktop/May 2020/covid/clustered.csv")
ggplot(clustered,mapping = aes(clustered$x,clustered$y))+
  xlab("x")+ylab("y")+
  ggtitle("Figure 4: Clustered point pattern and zones of influence")+
  geom_circle(aes(x0=clustered$x,y0=clustered$y,r=1),inherit.aes = FALSE,
             fill = "deeppink",alpha=0.5)+
  theme(text = element_text(size=40))+coord_fixed(ratio = 1) +
  geom_point(col="deeppink4",pch=16,size=6)+
  scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))+
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9,10))

Area of non-overlap (m^{2})

clustered = read.csv("C:/Users/User/Desktop/May 2020/covid/clustered.csv")
cluster = ppp(clustered$x,clustered$y,window=owin(c(0,10),c(0,10)))
sum(areaLoss(cluster,1))
## [1] 8.492777

Ripley’s K function

In the figure below, the empirical K function lies predominantly above the simulation envelope; a strong indication that the distribution of the observed points is more clustered than that expected for a instantiation of a Poisson process.

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

References

  1. Nightingale G. F., Illian J. B., King R., and Nightingale P., September 2019. Area Interaction Point Processes for Bivariate Point Patterns in a Bayesian Context, Journal of Environmental Statistics, vol 9, no 2

  2. Picard N, Bar-Hen A, Mortier F, Chadouf J (2009). Multi Scale Marked Area Interaction Point Processes: A Model for the Spatial Pattern of Trees." Scandinavian Journal of Statistics