Predicting spatial locations using point processes

Simon Barthelmé, University of Geneva

In many fields it is essential to predict where things are going to happen: earthquakes, storms, tornadoes, or perhaps, if you're a psychologist, eye movements. Point processes are an essential tool for the task, if only because they clarify the statistical nature of the problem. See here for a mostly nontechnical intro to point processes or this great book by Illian et al for in-depth material. Simple point processes are easy enough to work with, but for prediction you might need so-called doubly stochastic point processes, which are rather strange beasts.

Let's say you work for an insurance company that specialises in covering alien abductions. Every time one of the policy holders disappear, you send an agent to investigate. Sending agents far away costs money, so ideally you'd like to locate your agents close to hotspot locations. We'll call disappearances of policy holders “events”, and the goal is to predict where those events are likely to take place. Not even are the locations unknown, you don't event know how many events will take place. What you do know is where events took place in the previous years.

In this problem the object of interest (what we're trying to predict) is a point set: a finite set of spatial locations. We don't know how many events are going to take place, so the size of the set is unknown. To deal with the problem statistically, we need a probability distribution over the right objects, namely, finite sets of spatial locations. A point process provides just that, and the simplest point process model is the Poisson process. For a Poisson process you need the following ingredients:

  1. a “window of observation”, or domain, which is the area where events can take place (or be recorded). In our example, the window of observation could be the USA, if that's where the company operates.

  2. an “intensity function”, \( \lambda(s) \), where \( s \) is a spatial location.

In the Poisson process, the intensity function defines where events are likely to happen: the expected number of events to happen in an area \( A \) is the integral of the intensity function over \( A \). The Poisson process is called thus because event counts in an area \( A \) are Poisson distributed, for all possible areas \( A \). The expectation is given by the intensity function, and the variability is given by the corresponding Poisson distribution.

Here's how to generate samples from a Poisson process using R's spatstat package (by Adrian Baddeley).

We set up a window of observation:

library(spatstat)
library(maps)
us <- map_data("usa")

# Simplified outline of the USA
us.simple <- function() {
    map <- data.frame(x = us$long, y = us$lat)
    map <- subset(map, !is.na(x))
    map <- map[!duplicated(paste(map$x, map$y)), ]
    points <- c(177, 563, 1296, 1683, 1857, 2420, 3649, 3887, 4034, 4241, 4326, 
        4422, 4624, 4861, 5010, 5071, 5331, 5559, 5744, 5833, 6097, 6315, 6608, 
        6748)
    map[points, ]
}
ow <- owin(range(us$long), range(us$lat), poly = us.simple())
plot(ow$xr, ow$yr, type = "n", xlab = "Long.", ylab = "Lat.")
polygon(ow$bdry[[1]], col = rgb(0.5, 0.5, 0.1, 0.4))

plot of chunk owin

The homogeneous point process has a constant intensity function. Using rpoispp we generate two samples from a homogeneous PP:

set.seed(1)
lfun <- function(x, y) rep(0.1, length(x))  #Constant intensity
r1 <- rpoispp(lfun, win = ow)
r2 <- rpoispp(lfun, win = ow)
plot.result <- function(rp) {
    plot(ow$xr, ow$yr, type = "n", xlab = "Long.", ylab = "Lat.")
    polygon(ow$bdry[[1]], col = rgb(0.5, 0.5, 0.1, 0.4))
    points(rp$x, rp$y, pch = 19)
    text(-115, 28, sprintf("Total number\n of events: %i", rp$n))
}
layout(t(1:2))
plot.result(r1)
plot.result(r2)

plot of chunk hpp

If intensity decreases along an east-west axis we get a spatial trend:

set.seed(1)
lfun <- function(x, y) exp(0.11 * x + 8)  #East-west trend
r1 <- rpoispp(lfun, win = ow)
r2 <- rpoispp(lfun, win = ow)
layout(t(1:2))
plot.result(r1)
plot.result(r2)

plot of chunk hpp_trend

Let's assume we know that events are generated from a PP with the intensity function in the above example. The company has one agent who needs to travel to the location of the events, with a cost proportional to the straight-line distance to the agent's base of operations. Note \( b=(x_b,y_b) \) the location of the base. For a given set \( s \) of locations, the total cost will be \[ c(s;b) = \sum_{i \in s} d(b,s_i) \]

The expected cost is the average of \( c(s;b) \) over potential point sets \( s \). Let's call the expected cost for base location \( b \) \( E(b) \). Under the Poisson assumption we could compute it analytically, but it's easy to find a Monte Carlo approximation: we generate a whole bunch of location sets, and take the average.

rs <- rlply(200, rpoispp(lfun, win = ow))  #200 location sets
cost <- function(s, b) sum(sqrt((s$x - b$x)^2 + (s$y - b$y)^2))
ecost <- function(b) mean(sapply(rs, function(s) cost(s, b)))
# Generate a grid of points
X <- expand.grid(x = seq(ow$x[1], ow$x[2], l = 40), y = seq(ow$y[1], ow$y[2], 
    l = 40))
# Keep only the ones inside the obs. win
inw <- inside.owin(X[, 1], X[, 2], ow)
cst <- sapply(which(inw), function(ind) ecost(X[ind, ]))
df <- data.frame(x = X[inw, 1], y = X[inw, 2], ecost = cst)
ggplot(df, aes(x, y)) + geom_raster(aes(fill = ecost)) + geom_polygon(data = as.data.frame(us.simple()), 
    alpha = 0, col = "black", lwd = 4) + labs(x = "Long.", y = "Lat.", fill = "Expected cost") + 
    geom_contour(aes(z = ecost))

plot of chunk ecost_pp

This map tells us our agent should be in the Northeast where most of the events are likely to take place.

In actual fact we do not know what the intensity function is. Let's say for example that we do have data from the previous year. We fit a Poisson process model with a spatial trend along the x axis:

lfun.true <- function(x, y) exp(0.11 * x + 7.5)  #True intensity function
ppdat <- rpoispp(lfun.true, win = ow)  #Random data, supposed to represent data from the previous year
fit <- ppm(ppdat, ~x)  #Fit a PP model with intensity function
# lambda(x,y) = beta[1] + beta[2]*x
coef(fit)
## (Intercept)           x 
##      7.2811      0.1064

These values are the maximum likelihood estimates of the coefficients, but they come with some uncertainty. We have to take that uncertainty into account when making predictions. A possible solution is to use a Bayesian predictive distribution, as we do (here)[]. Let's call the old data (the set of locations from last year) \( s_t \). We are trying to predict the data for next year \( s_{t+1} \). We use the basic predictive formula: \[ p(s_{t+1}|s_t) = \int { p(s_{t+1}|\beta)p(\beta|s_t) d \beta } \]

\( p(\beta|s_t) \) is the posterior distribution for the coefficients given current data. The coefficients define the intensity function. \( p(s_{t+1}|\beta) \) is the probability of observing \( s_{t+1} \) given the coefficients, and that's a PP distribution. What the formula says is that in order to generate a set of points from the predictive point process, you can first sample coefficients from the posterior, then sample from a PP with intensity function given by the sampled coefficients.

The ppm function in spatstat does not produce posterior samples, but we'll make the optimistic assumption that the posterior is roughly Gaussian around the maximum likelihood estimate:

require(mvtnorm)
## Loading required package: mvtnorm
m.est <- coef(fit)  #Approx. posterior mean
V <- vcov(fit$internal$glmfit)  #Approx. posterior covariance
R <- rmvnorm(50, mean = m.est, sigma = V)  #Approx. posterior samples
plot(R, pch = 19, xlab = expression(beta[1]), ylab = expression(beta[2]), main = "Posterior samples")

plot of chunk post_samples

From posterior samples we can generate predictive samples:

gen.sample.pred <- function() {
    beta <- rmvnorm(1, mean = m.est, sigma = V)
    lfun <- function(x, y) exp(beta[1] + beta[2] * x)
    pp <- rpoispp(lfun, win = ow)
    data.frame(x = pp$x, y = pp$y)
}
library(plyr)
ggplot(rdply(6, gen.sample.pred()), aes(x, y)) + facet_wrap(~.n) + geom_point() + 
    geom_polygon(data = as.data.frame(us.simple()), fill = rgb(0.5, 0.5, 0.1), 
        alpha = 0.4, lwd = 1) + labs(x = "Long.", y = "Lat.", title = "Samples from the predictive process")

plot of chunk pred_samples

The figure above shows 6 different samples from the predictive process. Samples from the predictive process are point sets.

The predictive process is a more complicated object than a simple Poisson point process. Here are some important properties.

The predictive process is a Cox process

A Cox process, or “doubly stochastic process”, is a point process that is generated by 1. sampling a random intensity function \( \lambda(s) \) 2. sampling from the PP with intensity function \( \lambda(s) \) That's exactly how the predictive process works: step 1 corresponds to picking random coefficients from the posterior.

Points are no longer independent in the predictive process

A defining feature of the Poisson process is that knowing the location of one point in the set tells you nothing about where the other ones might be: the points are independent of one another. In the predictive process this is no longer true. The strength of the spatial trend varies randomly from sample to sample. If we're allowed to see part of a sample from the predictive process, and see that the part contains a large fraction of points from the Northeast. This would tell us that the sample was generated with an intensity function with strong spatial trend, and we'd know that the rest of the points would also be likely to be in the Northeast.

The predictive process is not the same as the fitted process

The fitted process is the maximum-likelihood estimate of the PP we got from the data. Its intensity function is fixed. Here's how to generate samples from the fitted process:

gen.sample.fitted <- function() {
    lfun <- function(x, y) exp(m.est[1] + m.est[2] * x)
    pp <- rpoispp(lfun, win = ow)
    data.frame(x = pp$x, y = pp$y)
}

ggplot(rdply(6, gen.sample.fitted()), aes(x, y)) + facet_wrap(~.n) + geom_point() + 
    geom_polygon(data = as.data.frame(us.simple()), fill = rgb(0.5, 0.5, 0.1), 
        alpha = 0.4, lwd = 1) + labs(x = "Long.", y = "Lat.", title = "Samples from the fitted process")

plot of chunk fitted_samples

The fitted process neglects estimation uncertainty. In general, minimising an expected cost under the fitted and predictive process will lead to different results.

Variability across point sets is not the same as variability within point sets

A sample from a point process is a set of points. The points in a particular sample may look very scattered, but that doesn't mean there will be a lot of variability across samples: you can imagine a point process that always generates the exact same set of scattered points. Conversely, you could have little within-sample variability but great between-sample variability. Imagine an outbreak model for the data. Every year, the aliens land somewhere in the USA and abduct everybody within a ten-km radius. They can land pretty much anywhere, but they always operate within the same limited radius around the landing site. Seen as a point process, the abduction locations will exhibit little within-sample variability but large across-sample variability.

A surprising consequence is that some features of the data might be very easy to predict and others not: in the outbreak model it's very difficult to predict the average location, but very easy to predict the variance in location.