Point pattern analysis plays a critical role in many fields such as physical geography, ecology, and epidemiology. This report focuses specifically on crime analysis. While the literature discusses dozens of statistical methods to quantify point patterns, this report will cover a handful: kernel density, nearest neighbor distance, and descriptive statistics.
Before working through a real-world example, it is essential to understand point patterns and how the eye can see things that aren’t necessarily there.
R’s spatstat packages provides an ability to easily and quickly create sample datasets to test our intuition about clustering. In Figure 1, the rpoispp() function was used to generate three random point patterns or roughly 100 points each.
par(mfrow=c(1, 3))
pp1 <- rpoispp(100)
plot(pp1)
pp2 <- rpoispp(100)
plot(pp2)
pp3 <- rpoispp(100)
plot(pp3)
Although all of these points are randomly generated using a Poisson
point process and meet the definition of complete spatial randomness
(each location has an equal chance of containing an event and events are
independently distributed), our brains can still manifest what seem to
be meaningful clusters from the data. These “cluster hallucinations” can
be scale-dependent. When zooming in at a given patch of the unit square,
viewers may notice that some point locations are nearly identical, but
this is merely by chance. When looking for potential first order
effects, it is generally apparent that the points are randomly spaced
across the unit square; however, this isn’t always the case with point
processes.
Using the rpoispp() function, we can add a function to modify the distribution of points in the unit square. In this example (Figure 2), each location still has a chance of containing a point, however the function will weight locations further away from the center higher than those directly in the center.
par(mfrow=c(1, 3))
pp4 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp4)
pp5 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp5)
pp6 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp6)
In Figure 2, the dispersion of points away from the center is
noticeable. Similar to the homogeneous process in Figure 1, our brains
can see phantom patterns especially depending on the scale at which the
point pattern is viewed.
R also provides support to create point patterns where the location of one point is dependent on the locations of other points. In this example, the rSSI() and rThomas() functions are used:
Figure 3 shows the output of each function.
par(mfrow=c(1, 2))
pp7 <- rSSI(0.05, 100)
plot(pp7)
pp8 <- rThomas(0.9, 0.2, 10)
plot(pp8)
The rSSI() output (pp7) appears less clustered than the output of
rThomas() (pp8), which is the intended outcome. Given the specified
parameters for rThomas(), R was only able to create one cluster in this
iteration.
Recognizing that some random point patterns can appear to be clustered or uniform, the second part of this project will focus on a real world example: analyzing point patterns of reported crimes in St. Louis, MO for 2013 to 2014.
This analysis will be conducted using two files:
crimes_shp_prj <- read_sf("C:/temp/PSU_operational/GEOG586/Lesson3/Geog586_Les3_Project/gis/stl20132014sube.shp")
StLouis_BND_prj <- read_sf("C:/temp/PSU_operational/GEOG586/Lesson3/Geog586_Les3_Project/gis/stl_boundary.shp")
Sbnd <- as.owin(StLouis_BND_prj)
Before analysis begins, the crime data needs to be converted to a point pattern dataset in R.
crimes_marks <- data.frame(crimes_shp_prj)
crimesppp <- ppp(crimes_shp_prj$MEAN_XCoor, crimes_shp_prj$MEAN_YCoor, window = Sbnd, marks = crimes_marks)
Only three crime types (homicide, car theft, and begging) will be discussed in this analysis. Each crime displays a unique point pattern.
homicide <- crimesppp[crimes_shp_prj$crimet2 == "homicide"]
car_theft <- crimesppp[crimes_shp_prj$crimet2 == "car_theft"]
begging <- crimesppp[crimes_shp_prj$crimet2 == "begging"]
The bandwidth of a kernel can play a significant role in the conclusions drawn from the analysis. In order to get an initial look at each of the three point patterns and see the role bandwidth can play in kernel density analysis, Figure 4 shows a kernel density for each crime type (columns) and different bandwidths (rows).
homicide_opt <- bw.diggle(homicide)
car_theft_opt <- bw.diggle(car_theft)
begging_opt <- bw.diggle(begging)
# homicide default bandwidth
KHom_default <- density(homicide)
plot(KHom, main = NULL, las = 1)
# car theft default bandwidth
KCar_default <- density(car_theft)
plot(KCar_default, main = NULL, las = 1)
# begging default bandwidth
KBeg_default <- density(begging)
plot(KBeg_default, main = NULL, las = 1)
# homicide optimized bandwidth
KHom_opt <- density(homicide, sigma = homicide_opt)
plot(KHom_opt, main = NULL, las = 1)
# car theft, homicide optimized bandwidth
KCar_hom_opt <- density(car_theft, sigma = homicide_opt)
plot(KCar, main = NULL, las = 1)
# begging, homicide optimized bandwidth
KBeg_hom_opt <- density(car_theft, sigma = begging_opt)
plot(KBeg_hom_opt, main = NULL, las = 1)
# homicide, car theft optimized bandwidth
KHom_car_opt <- density(homicide, sigma = car_theft_opt)
plot(KBeg_hom_opt, main = NULL, las = 1)
# car theft optimized bandwidth
KCar_opt <- density(car_theft, sigma = car_theft_opt)
plot(KCar_opt, main = NULL, las = 1)
# begging, car theft optimized bandwidth
KBeg_car_opt <- density(begging, sigma = car_theft_opt)
plot(KBeg_car_opt, main = NULL, las = 1)
# homicide, begging optimized bandwidth
KHom_beg_opt <- density(homicide, sigma = begging_opt)
plot(KHom_beg_opt, main = NULL, las = 1)
# car theft, begging optimized bandwidth
KCar_beg_opt <- density(car_theft, sigma = begging_opt)
plot(KCar_beg_opt, main = NULL, las = 1)
# begging optimized bandwidth
KBeg_opt <- density(begging, sigma = begging_opt)
plot(KBeg_opt, main = NULL, las = 1)
As can be seen in Figure 4, the perceived intensity of a point pattern
can be greatly impacted by choice in bandwidth. While the “optimized”
bandwidth was better than the default bandwidth for each crime, I found
some optimized bandwidths to better fit data from another crime (car
theft optimized bandwidth fitting begging data, for example).
Another useful way of examining point patterns is to plot the distribution of distances from each point to its nearest neighbor. Figure 5 shows histograms for each of the three crime types.
par(mfrow=c(1, 3))
nnd_homicide <- nndist.ppp(homicide)
hist(nnd_homicide)
nnd_car_theft <- nndist.ppp(car_theft)
hist(nnd_car_theft)
nnd_begging <- nndist.ppp(begging)
hist(nnd_begging)
Figure 5
While each of the three crimes have similar nearest neighbor distance distributions, looking at the x-axis reveals that even the furthest nearest neighbor pair in the begging data is significantly closer than a portion of the homicide data. This may be a sign of clustering in the begging data.
A variety of statistics can be calculated for a point pattern to measure both its central tendency and variability. For each of the crimes, we will calculate and plot the weighted mean center (considering the frequency variable for each record), the central point feature, and standard distance.
The code snippet below performs calculations for the homicide data. The calculations for car theft and begging are identical, but excluded for brevity. Figure 6 shows the output plots for each crime.
homicide_df <- data.frame(homicide)
str(homicide_df)
homicide_df$FREQUENCY <- as.integer(homicide_df$FREQUENCY)
homicide_df$OBJECTID <- as.integer(homicide_df$OBJECTID)
a = list(homicide$x)
# mean center
xmean_homicide <- mean(homicide$x)
ymean_homicide <- mean(homicide$y)
# weighted mean center
d = 0
sumcount = 0
sumxbar = 0
sumybar = 0
for(i in 1:length(a[[1]])){
xbar <- (homicide$x[i] * homicide_df$FREQUENCY[i])
ybar <- (homicide$y[i] * homicide_df$FREQUENCY[i])
sumxbar = xbar + sumxbar
sumybar = ybar + sumybar
sumcount <- homicide_df$FREQUENCY[i]+ sumcount
}
xbarw <- sumxbar/sumcount
ybarw <- sumybar/sumcount
# standard distance
d = 0
for(i in 1:length(a[[1]])){
c <- ((homicide$x[i] - xmean_homicide)^2 + (homicide$y[i] - ymean_homicide)^2)
d <- (d + c)
}
Std_Dist <- sqrt(d / length(a[[1]]))
bearing <- 1:360 * pi/180
cx <- xmean_homicide + Std_Dist * cos(bearing)
cy <- ymean_homicide + Std_Dist * sin(bearing)
circle <- cbind(cx, cy)
# central point
sumdist2 = 1000000000
for(i in 1:length(a[[1]])){
x1 = homicide$x[i]
y1= homicide$y[i]
recno = homicide_df$OBJECTID[i]
sumdist1 = 0
for(j in 1:length(a[[1]])){
recno2 = homicide_df$OBJECTID[j]
x2 = homicide$x[j]
y2= homicide$y[j]
if(recno==recno2){
}else {
dist1 <-(sqrt((x2-x1)^2 + (y2-y1)^2))
sumdist1 = sumdist1 + dist1
}
}
if (sumdist1 < sumdist2){
dist3<-list(recno, sumdist1, x1,y1)
sumdist2 = sumdist1
xdistmin <- x1
ydistmin <- y1
}
}
# plots
plot(Sbnd)
points(homicide$x, homicide$y)
points(xbarw,ybarw,col = "blue", cex = 1.5, pch = 19) # weighted mean center
points(dist3[[3]][1],dist3[[4]][1],col = "orange", cex = 1.5, pch = 19) # central point
lines(circle, col = 'red', lwd = 2)
Figure 6 shows three distinct point patterns:
Additional analysis could include quantifying the relationships between these three point patterns as well as expanding beyond just these three selected crimes.
Important to note is the social context surrounding this data. In 2014, nearby Ferguson, MO saw large scale protests following the police killing of 18-year old Michael Brown. Crime data analysis should be conducted with this in mind and ethical considerations should be taken to, for example, police all parts of the community equally. Even the reported begging incidents, for example, were clustered in high-traffic, relatively affluent, and touristy areas; whereas there were only a handful of begging incidents in the more impoverished neighborhoods.