Disclaimer: The contents of this document come from Chapter 11 Point Pattern Analysis and the Appendix of that chapter of Intro to GIS and Spatial Analysis(Gimond, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.
This document is also published on RPubs.

Motivation

Learning Objectives

What we do:

  1. Learn how to describe and analyze the spatial distribution of points
  1. Density-based approach
  1. Distance-based approach

11.1 Centrography

11.2 Density-Based Analysis

11.2.1 Global Density

  • The ratio of observed number of points, n, to the study region’s surface area, a.
  • Not highly informative

11.2.2 Local Density

  • Assess if the density (and, by extension, the underlying process’ local intensity ˆλi) is constant across the study area
11.2.2.1 Quadrat Density
  • A study area is divided into sub-regions (aka quadrats).
  • Then, the point density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area.
  • The choice of quadrat numbers and quadrat shape can influence the measure of local density.
    • With very small quadrat sizes, you risk having many quadrats with no points which may prove uninformative.
    • With very large quadrat sizes, you risk missing subtle changes in spatial density distributions.
  • The quadrat analysis approach has its advantages in that it is easy to compute and interpret however, it does suffer from the modifiable areal unit problem(MAUP)].
  • Quadrat regions can also be defined based on a covariate.

In-class exercises: Quadrat Density

  • First, let’s prepare data sets
load(url("http://github.com/mgimond/Spatial/raw/master/Data/ppa.RData"))

starbucks # A ppp point layer of Starbucks stores in Massachusetts;
ma        # An owin polygon layer of Massachusetts boundaries;
pop       # An im raster layer of population density distribution.
  • Load libraries; if you did not install them, then install first.
library(rgdal)
library(maptools)
library(raster)
# Load an MA.shp polygon shapefile 
s    <- readOGR(".", "MA")  # Don't add the .shp extension
ma    <- as(s, "owin") 

# Load a starbucks.shp point feature shapefile
s  <- readOGR(".","starbucks")  # Don't add the .shp extension
starbucks    <- as(s, "ppp")

# Load a pop_sqmile.img population density raster layer
img  <- raster("pop_sqmile.img")
pop  <- as.im(img)
  • All point pattern analysis tools used in this tutorial are available in the spatstat package.
  • These tools are designed to work with points stored as ppp objects and not SpatialPointsDataFrame or sf objects.
library(spatstat)
  • Note that a ppp object may or may not have attribute information (also referred to as marks).
  • For now, we will only concern ourselves with the pattern generated by the points and not their attributes.
  • Let’s make sure that marks are NULL.
marks(starbucks)  <- NULL
  • Many point pattern analyses should have their study boundaries explicitly defined.
  • Here, we use spatstat::Window to bind the MA boundary polygon to the Starbuck ppp object
plot(starbucks, main=NULL, cols=rgb(0,0,0,.2), pch=20) # before specifying the boundary 
Window(starbucks) <- ma                                # define the boundary  
plot(starbucks, main=NULL, cols=rgb(0,0,0,.2), pch=20) # after specifying the boundary  

  • Population density values for an administrative layer are usually quite skewed.
  • Transforming the skewed distribution in the population density covariate may help reveal relationships.
hist(pop, main=NULL, las=1)

pop.lg <- log(pop)
hist(pop.lg, main=NULL, las=1)

  • Divide Massachusetts into a grid of 3 rows and 6 columns then counts the number of points falling in each quadrat.
Q <- quadratcount(starbucks, nx= 6, ny=3)
plot(starbucks, pch=20, cols="grey70", main=NULL)  # Plot points
plot(Q, add=TRUE)  # Add quadrat grid

  • Compute the density for each quadrat
Q.d <- intensity(Q)
  • Plot the density
plot(intensity(Q, image=TRUE), main=NULL, las=1)  # Plot density raster, las?:http://rfunction.com/archives/1302
plot(starbucks, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

  • spatstat::rescale - rescale the length unit from m to km
  • The second argument to the rescale function divides the current unit (meter) to get the new unit (kilometer)
starbucks.km <- rescale(starbucks, 1000, "km")
ma.km <- rescale(ma, 1000, "km")
pop.km    <- rescale(pop, 1000, "km")
pop.lg.km <- rescale(pop.lg, 1000, "km")
  • Compute the density for each quadrat (in counts per km2)
Q   <- quadratcount(starbucks.km, nx= 6, ny=3)
Q.d <- intensity(Q)
  • Plot the density
plot(intensity(Q, image=TRUE), main=NULL, las=1)  # Plot density raster
plot(starbucks.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)  # Add points

  • We can use a covariate such as the population density raster to define non-uniform quadrats.
  • Divide the population density covariate into four regions (aka tessellated surfaces)
  • With an equal interval classification scheme
brk  <- c( -Inf, 4, 6, 8 , Inf)                 # Define the breaks
Zcut <- cut(pop.lg.km, breaks=brk, labels=1:4)  # Classify the raster
E    <- tess(image=Zcut)                        # Create a tesselated surface
  • check the output: four areas defined by log population density
plot(E, main="", las=1) # no main title 

  • Tally the quadrat counts within each tessellated area
Q   <- quadratcount(starbucks.km, tess = E)  # Tally counts
  • Compute their density values (number of points per quadrat area).
  • number of points per square kilometer within each quadrat unit
Q.d <- intensity(Q)  # Compute density
  • plot the new quadrat density
plot(intensity(Q, image=TRUE), las=1, main=NULL)
plot(starbucks.km, pch=20, cex=0.6, col=rgb(1,1,1,.5), add=TRUE) 

  • update the color scheme
cl <-  interp.colours(c("lightyellow", "orange" ,"red"), E$n)
plot(intensity(Q, image=TRUE), las=1, col=cl, main=NULL)
plot(starbucks.km, pch=20, cex=0.6, col=rgb(0,0,0,.5), add=TRUE)

11.2.2.2 Kernel Density
  • Like the quadrat density, the kernel approach computes a localized density for subsets of the study area.
  • Unlike its quadrat density counterpart, the sub-regions overlap one another providing a moving sub-region window, which is defined by a kernel.
  • Each cell is assigned the density value computed for the kernel window centered on that cell.
  • A kernel not only defines the shape and size of the window, but it can also weight the points following a well defined kernel function.
11.2.2.3 Kernel Density Adjusted for a Covariate
  • Normalize the computed density (from our observed point pattern) to a measure of density expected from the underlying covariate.
  • A normalized density is denoted as rho, ρ.
  • In the below exmaple, if the density distribution, ρ, can be explained by the elevation layer, we would expect:
    • A constant ρ value across all elevation values in the ρ vs. elevation plot (in the middle plot)
    • A constant ρ value across the entire spatial extent in the density map (in the right plot).

In-class exercises: Kernel Density (Non-parametric)

  • density computes an isotropic kernel intensity estimate of the point pattern
K1 <- density(starbucks.km) # Using the default bandwidth
plot(K1, main=NULL, las=1)
contour(K1, add=TRUE)

# with a 50 km bandwidth
K2 <- density(starbucks.km, sigma=50) # Using a 50km bandwidth
plot(K2, main=NULL, las=1)
contour(K2, add=TRUE)

# change the kernel to a disc function type
K3 <- density(starbucks.km, kernel = "disc", sigma=50) # Using a 50km bandwidth
plot(K3, main=NULL, las=1)
contour(K3, add=TRUE)

  • Compute rho using the ratio method
rho <- rhohat(starbucks.km, pop.lg.km,  method="ratio")
# Generate rho vs covariate plot: non-parametric fitting curve 
plot(rho, las=1, main=NULL)

pred <- predict(rho)
cl   <- interp.colours(c("lightyellow", "orange" ,"red"), 100) # Create color scheme
plot(pred, col=cl, las=1, main=NULL)

11.2.3 Modeling intensity as a function of a covariate

  • A basic idea is to mathematically model the relationship between the distribution of points and some underlying covariate.
  • Poisson point process: the underlying intensity at point i is an exponential function of covariate(s) at point i.

In-class exercises: Poisson Point Model

  • Create the Poisson point process model (Parametric)
PPM1 <- ppm(starbucks.km ~ pop.lg.km)
## Warning: Values of the covariate 'pop.lg.km' were NA or undefined at
## 0.57% (4 out of 699) of the quadrature points. Occurred while executing:
## ppm.ppp(Q = starbucks.km, trend = ~pop.lg.km, data = NULL, interaction =
## NULL)
  • Plot the relationship
plot(effectfun(PPM1, "pop.lg.km", se.fit=TRUE), main=NULL, las=1)

  • check the poisson point model
PPM1

11.3 Distance based analysis

To be continued