Point Pattern Analysis

Disclaimer: The contents of this document come from Chapter 11 Point Pattern Analysis and the Appendix of that chapter of Intro to GIS and Spatial Analysis(Gimond, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.

This document is also published on RPubs.

Motivation

Learning Objectives

What we do:

Learn how to describe and analyze the spatial distribution of points

Density-based
Distance-based

Density-based approach

Global density
Local density: quadrat and Kernel
Modeling intensity as a function of a covariate(s)

Distance-based approach

To be continued

11.1 Centrography

Maybe too limiting

11.2 Density-Based Analysis

Analyze the point pattern in terms of its distribution vis-a-vis the study area
A first-order property of the pattern
A “realization” of an underlying process whose intensity λ is estimated from the observed point pattern’s density

11.2.1 Global Density

The ratio of observed number of points, n, to the study region’s surface area, a.
Not highly informative

11.2.2 Local Density

Assess if the density (and, by extension, the underlying process’ local intensity ˆλi) is constant across the study area

11.2.2.1 Quadrat Density

A study area is divided into sub-regions (aka quadrats).
Then, the point density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area.
The choice of quadrat numbers and quadrat shape can influence the measure of local density.
- With very small quadrat sizes, you risk having many quadrats with no points which may prove uninformative.
- With very large quadrat sizes, you risk missing subtle changes in spatial density distributions.
The quadrat analysis approach has its advantages in that it is easy to compute and interpret however, it does suffer from the modifiable areal unit problem(MAUP)].
Quadrat regions can also be defined based on a covariate.

In-class exercises: Quadrat Density

First, let’s prepare data sets

load(url("http://github.com/mgimond/Spatial/raw/master/Data/ppa.RData"))

starbucks # A ppp point layer of Starbucks stores in Massachusetts;
ma        # An owin polygon layer of Massachusetts boundaries;
pop       # An im raster layer of population density distribution.

Load libraries; if you did not install them, then install first.

library(rgdal)
library(maptools)
library(raster)

If you want to build the above objects from raw files, use the following scripts after directly downloading files from the author’s Github page: http://github.com/mgimond/Spatial/raw/master/Data

# Load an MA.shp polygon shapefile 
s    <- readOGR(".", "MA")  # Don't add the .shp extension
ma    <- as(s, "owin") 

# Load a starbucks.shp point feature shapefile
s  <- readOGR(".","starbucks")  # Don't add the .shp extension
starbucks    <- as(s, "ppp")

# Load a pop_sqmile.img population density raster layer
img  <- raster("pop_sqmile.img")
pop  <- as.im(img)

All point pattern analysis tools used in this tutorial are available in the spatstat package.
These tools are designed to work with points stored as ppp objects and not SpatialPointsDataFrame or sf objects.

library(spatstat)

Note that a ppp object may or may not have attribute information (also referred to as marks).
For now, we will only concern ourselves with the pattern generated by the points and not their attributes.
Let’s make sure that marks are NULL.

marks(starbucks)  <- NULL

Many point pattern analyses should have their study boundaries explicitly defined.
Here, we use spatstat::Window to bind the MA boundary polygon to the Starbuck ppp object

plot(starbucks, main=NULL, cols=rgb(0,0,0,.2), pch=20) # before specifying the boundary

Window(starbucks) <- ma                                # define the boundary  
plot(starbucks, main=NULL, cols=rgb(0,0,0,.2), pch=20) # after specifying the boundary

Population density values for an administrative layer are usually quite skewed.
Transforming the skewed distribution in the population density covariate may help reveal relationships.

hist(pop, main=NULL, las=1)

pop.lg <- log(pop)
hist(pop.lg, main=NULL, las=1)

Divide Massachusetts into a grid of 3 rows and 6 columns then counts the number of points falling in each quadrat.

Q <- quadratcount(starbucks, nx= 6, ny=3)
plot(starbucks, pch=20, cols="grey70", main=NULL)  # Plot points
plot(Q, add=TRUE)  # Add quadrat grid

Compute the density for each quadrat

Q.d <- intensity(Q)

Plot the density

plot(intensity(Q, image=TRUE), main=NULL, las=1)  # Plot density raster, las?:http://rfunction.com/archives/1302
plot(starbucks, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

spatstat::rescale - rescale the length unit from m to km
The second argument to the rescale function divides the current unit (meter) to get the new unit (kilometer)

starbucks.km <- rescale(starbucks, 1000, "km")
ma.km <- rescale(ma, 1000, "km")
pop.km    <- rescale(pop, 1000, "km")
pop.lg.km <- rescale(pop.lg, 1000, "km")

Compute the density for each quadrat (in counts per km2)

Q   <- quadratcount(starbucks.km, nx= 6, ny=3)
Q.d <- intensity(Q)

Plot the density

plot(intensity(Q, image=TRUE), main=NULL, las=1)  # Plot density raster
plot(starbucks.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

We can use a covariate such as the population density raster to define non-uniform quadrats.
Divide the population density covariate into four regions (aka tessellated surfaces)
With an equal interval classification scheme

brk  <- c( -Inf, 4, 6, 8 , Inf)                 # Define the breaks
Zcut <- cut(pop.lg.km, breaks=brk, labels=1:4)  # Classify the raster
E    <- tess(image=Zcut)                        # Create a tesselated surface

check the output: four areas defined by log population density

plot(E, main="", las=1) # no main title

Tally the quadrat counts within each tessellated area

Q   <- quadratcount(starbucks.km, tess = E)  # Tally counts

Compute their density values (number of points per quadrat area).
number of points per square kilometer within each quadrat unit

Q.d <- intensity(Q)  # Compute density

plot the new quadrat density

plot(intensity(Q, image=TRUE), las=1, main=NULL)
plot(starbucks.km, pch=20, cex=0.6, col=rgb(1,1,1,.5), add=TRUE)

update the color scheme

cl <-  interp.colours(c("lightyellow", "orange" ,"red"), E$n)
plot(intensity(Q, image=TRUE), las=1, col=cl, main=NULL)
plot(starbucks.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)

11.2.2.2 Kernel Density

Like the quadrat density, the kernel approach computes a localized density for subsets of the study area.
Unlike its quadrat density counterpart, the sub-regions overlap one another providing a moving sub-region window, which is defined by a kernel.
Each cell is assigned the density value computed for the kernel window centered on that cell.
A kernel not only defines the shape and size of the window, but it can also weight the points following a well defined kernel function.

11.2.2.3 Kernel Density Adjusted for a Covariate

Normalize the computed density (from our observed point pattern) to a measure of density expected from the underlying covariate.
A normalized density is denoted as rho, ρ.
In the below exmaple, if the density distribution, ρ, can be explained by the elevation layer, we would expect:
- A constant ρ value across all elevation values in the ρ vs. elevation plot (in the middle plot)
- A constant ρ value across the entire spatial extent in the density map (in the right plot).

In-class exercises: Kernel Density (Non-parametric)

density computes an isotropic kernel intensity estimate of the point pattern

K1 <- density(starbucks.km) # Using the default bandwidth
plot(K1, main=NULL, las=1)
contour(K1, add=TRUE)

# with a 50 km bandwidth
K2 <- density(starbucks.km, sigma=50) # Using a 50km bandwidth
plot(K2, main=NULL, las=1)
contour(K2, add=TRUE)

# change the kernel to a disc function type
K3 <- density(starbucks.km, kernel = "disc", sigma=50) # Using a 50km bandwidth
plot(K3, main=NULL, las=1)
contour(K3, add=TRUE)

Compute rho using the ratio method

rho <- rhohat(starbucks.km, pop.lg.km,  method="ratio")
# Generate rho vs covariate plot: non-parametric fitting curve 
plot(rho, las=1, main=NULL)

pred <- predict(rho)
cl   <- interp.colours(c("lightyellow", "orange" ,"red"), 100) # Create color scheme
plot(pred, col=cl, las=1, main=NULL)

11.2.3 Modeling intensity as a function of a covariate

A basic idea is to mathematically model the relationship between the distribution of points and some underlying covariate.
Poisson point process: the underlying intensity at point i is an exponential function of covariate(s) at point i.

In-class exercises: Poisson Point Model

Create the Poisson point process model (Parametric)

PPM1 <- ppm(starbucks.km ~ pop.lg.km)

## Warning: Values of the covariate 'pop.lg.km' were NA or undefined at
## 0.57% (4 out of 699) of the quadrature points. Occurred while executing:
## ppm.ppp(Q = starbucks.km, trend = ~pop.lg.km, data = NULL, interaction =
## NULL)

Plot the relationship

plot(effectfun(PPM1, "pop.lg.km", se.fit=TRUE), main=NULL, las=1)

check the poisson point model

PPM1

11.3 Distance based analysis

To be continued

Point Pattern Analysis

March 27, 2019 (Wednesday)

This document is also published on RPubs.

Motivation

Learning Objectives

11.1 Centrography

11.2 Density-Based Analysis

11.2.1 Global Density

11.2.2 Local Density

11.2.2.1 Quadrat Density

In-class exercises: Quadrat Density

11.2.2.2 Kernel Density

11.2.2.3 Kernel Density Adjusted for a Covariate

In-class exercises: Kernel Density (Non-parametric)

11.2.3 Modeling intensity as a function of a covariate

In-class exercises: Poisson Point Model

11.3 Distance based analysis