Introduction

Point pattern analysis plays a critical role in many fields such as physical geography, ecology, and epidemiology. This report focuses specifically on crime analysis. While the literature discusses dozens of statistical methods to quantify point patterns, this report will cover a handful: kernel density, nearest neighbor distance, and descriptive statistics.

Part A

Before working through a real-world example, it is essential to understand point patterns and how the eye can see things that aren’t necessarily there.

Poisson Point Process

R’s spatstat packages provides an ability to easily and quickly create sample datasets to test our intuition about clustering. In Figure 1, the rpoispp() function was used to generate three random point patterns or roughly 100 points each.

par(mfrow=c(1, 3))

pp1 <- rpoispp(100)
plot(pp1)
pp2 <- rpoispp(100)
plot(pp2)
pp3 <- rpoispp(100)
plot(pp3)

Although all of these points are randomly generated using a Poisson point process and meet the definition of complete spatial randomness (each location has an equal chance of containing an event and events are independently distributed), our brains can still manifest what seem to be meaningful clusters from the data. These “cluster hallucinations” can be scale-dependent. When zooming in at a given patch of the unit square, viewers may notice that some point locations are nearly identical, but this is merely by chance. When looking for potential first order effects, it is generally apparent that the points are randomly spaced across the unit square; however, this isn’t always the case with point processes.

Poisson Point Process with a Function

Using the rpoispp() function, we can add a function to modify the distribution of points in the unit square. In this example (Figure 2), each location still has a chance of containing a point, however the function will weight locations further away from the center higher than those directly in the center.

par(mfrow=c(1, 3))

pp4 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp4)
pp5 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp5)
pp6 <- rpoispp(function (x,y) {200*sqrt((x - 0.5)^2 + (y - 0.5)^2)})
plot(pp6)

In Figure 2, the dispersion of points away from the center is noticeable. Similar to the homogeneous process in Figure 1, our brains can see phantom patterns especially depending on the scale at which the point pattern is viewed.

Processes with Interactions Between Events

R also provides support to create point patterns where the location of one point is dependent on the locations of other points. In this example, the rSSI() and rThomas() functions are used:

  • rSSI() accepts two parameters: an inhibition distance and a number of events. The inhibition distance is the minimum distance one point can be from another. If more points than are possible are requested, the function will return a warning message.
  • rThomas() accepts three parameters: the point intensity, the size of the clusters as the standard deviation of a normally distributed distance from the center, and the number of accepted events in each cluster. The function can be beneficial in creating clustered point patterns whereas all of the previously discussed functions tended to create random or dispersed point patterns.

Figure 3 shows the output of each function.

par(mfrow=c(1, 2))

pp7 <- rSSI(0.05, 100)
plot(pp7)
pp8 <- rThomas(0.9, 0.2, 10)
plot(pp8)

The rSSI() output (pp7) appears less clustered than the output of rThomas() (pp8), which is the intended outcome. Given the specified parameters for rThomas(), R was only able to create one cluster in this iteration.

Part B

Recognizing that some random point patterns can appear to be clustered or uniform, the second part of this project will focus on a real world example: analyzing point patterns of reported crimes in St. Louis, MO for 2013 to 2014.

Data Overview

This analysis will be conducted using two files:

  • stl20132014sube.shp contains 8,738 point features representing the location of a reported crime in St. Louis between 2013 and 2014. Other relevant fields include “crimet2” (the type of crime reported) and “FREQUENCY” (the number of instances of the crime at that location during the specified time)
  • stl_boundary.shp contains the polygon outline of the city boundaries. This will be used to define the study area.
crimes_shp_prj <- read_sf("C:/temp/PSU_operational/GEOG586/Lesson3/Geog586_Les3_Project/gis/stl20132014sube.shp")
StLouis_BND_prj <- read_sf("C:/temp/PSU_operational/GEOG586/Lesson3/Geog586_Les3_Project/gis/stl_boundary.shp")

Sbnd <- as.owin(StLouis_BND_prj)

Before analysis begins, the crime data needs to be converted to a point pattern dataset in R.

crimes_marks <- data.frame(crimes_shp_prj)
crimesppp <- ppp(crimes_shp_prj$MEAN_XCoor, crimes_shp_prj$MEAN_YCoor, window = Sbnd, marks = crimes_marks)

Only three crime types (homicide, car theft, and begging) will be discussed in this analysis. Each crime displays a unique point pattern.

  • homicide: 200 reported locations
  • car theft: 2,015 reported locations
  • begging: 278 reported locations
homicide <- crimesppp[crimes_shp_prj$crimet2 == "homicide"]
car_theft <- crimesppp[crimes_shp_prj$crimet2 == "car_theft"]
begging <- crimesppp[crimes_shp_prj$crimet2 == "begging"]

Kernel Density

The bandwidth of a kernel can play a significant role in the conclusions drawn from the analysis. In order to get an initial look at each of the three point patterns and see the role bandwidth can play in kernel density analysis, Figure 4 shows a kernel density for each crime type (columns) and different bandwidths (rows).

homicide_opt <- bw.diggle(homicide)
car_theft_opt <- bw.diggle(car_theft)
begging_opt <- bw.diggle(begging)

# homicide default bandwidth
KHom_default <- density(homicide)
plot(KHom, main = NULL, las = 1)

# car theft default bandwidth
KCar_default <- density(car_theft)
plot(KCar_default, main = NULL, las = 1)

# begging default bandwidth
KBeg_default <- density(begging)
plot(KBeg_default, main = NULL, las = 1)

# homicide optimized bandwidth
KHom_opt <- density(homicide, sigma = homicide_opt)
plot(KHom_opt, main = NULL, las = 1)

# car theft, homicide optimized bandwidth
KCar_hom_opt <- density(car_theft, sigma = homicide_opt)
plot(KCar, main = NULL, las = 1)

# begging, homicide optimized bandwidth
KBeg_hom_opt <- density(car_theft, sigma = begging_opt)
plot(KBeg_hom_opt, main = NULL, las = 1)

# homicide, car theft optimized bandwidth
KHom_car_opt <- density(homicide, sigma = car_theft_opt)
plot(KBeg_hom_opt, main = NULL, las = 1)

# car theft optimized bandwidth
KCar_opt <- density(car_theft, sigma = car_theft_opt)
plot(KCar_opt, main = NULL, las = 1)

# begging, car theft optimized bandwidth
KBeg_car_opt <- density(begging, sigma = car_theft_opt)
plot(KBeg_car_opt, main = NULL, las = 1)

# homicide, begging optimized bandwidth
KHom_beg_opt <- density(homicide, sigma = begging_opt)
plot(KHom_beg_opt, main = NULL, las = 1)

# car theft, begging optimized bandwidth
KCar_beg_opt <- density(car_theft, sigma = begging_opt)
plot(KCar_beg_opt, main = NULL, las = 1)

# begging optimized bandwidth
KBeg_opt <- density(begging, sigma = begging_opt)
plot(KBeg_opt, main = NULL, las = 1)

As can be seen in Figure 4, the perceived intensity of a point pattern can be greatly impacted by choice in bandwidth. While the “optimized” bandwidth was better than the default bandwidth for each crime, I found some optimized bandwidths to better fit data from another crime (car theft optimized bandwidth fitting begging data, for example).

Nearest Neighbor Distance

Another useful way of examining point patterns is to plot the distribution of distances from each point to its nearest neighbor. Figure 5 shows histograms for each of the three crime types.

par(mfrow=c(1, 3))

nnd_homicide <- nndist.ppp(homicide)
hist(nnd_homicide)
nnd_car_theft <- nndist.ppp(car_theft)
hist(nnd_car_theft)
nnd_begging <- nndist.ppp(begging)
hist(nnd_begging)

Figure 5

While each of the three crimes have similar nearest neighbor distance distributions, looking at the x-axis reveals that even the furthest nearest neighbor pair in the begging data is significantly closer than a portion of the homicide data. This may be a sign of clustering in the begging data.

Descriptive Statistics

A variety of statistics can be calculated for a point pattern to measure both its central tendency and variability. For each of the crimes, we will calculate and plot the weighted mean center (considering the frequency variable for each record), the central point feature, and standard distance.

The code snippet below performs calculations for the homicide data. The calculations for car theft and begging are identical, but excluded for brevity. Figure 6 shows the output plots for each crime.

homicide_df <- data.frame(homicide)
str(homicide_df)

homicide_df$FREQUENCY <- as.integer(homicide_df$FREQUENCY)
homicide_df$OBJECTID <- as.integer(homicide_df$OBJECTID)

a = list(homicide$x)

# mean center
xmean_homicide <- mean(homicide$x)
ymean_homicide <- mean(homicide$y)

# weighted mean center
d = 0
sumcount = 0
sumxbar = 0
sumybar = 0
for(i in 1:length(a[[1]])){
  xbar <- (homicide$x[i] * homicide_df$FREQUENCY[i])
  ybar <- (homicide$y[i] * homicide_df$FREQUENCY[i])
  sumxbar = xbar + sumxbar
  sumybar = ybar + sumybar
  sumcount <- homicide_df$FREQUENCY[i]+ sumcount
}

xbarw <- sumxbar/sumcount
ybarw <- sumybar/sumcount

# standard distance
d = 0
for(i in 1:length(a[[1]])){
  c <- ((homicide$x[i] - xmean_homicide)^2 + (homicide$y[i] - ymean_homicide)^2)
  d <- (d + c)
}
Std_Dist <- sqrt(d / length(a[[1]]))

bearing <- 1:360 * pi/180
cx <- xmean_homicide + Std_Dist * cos(bearing)
cy <- ymean_homicide + Std_Dist * sin(bearing)
circle <- cbind(cx, cy)

# central point
sumdist2 = 1000000000
for(i in 1:length(a[[1]])){
  x1 = homicide$x[i]
  y1= homicide$y[i]
  recno = homicide_df$OBJECTID[i]
    sumdist1 = 0
    for(j in 1:length(a[[1]])){
      recno2 = homicide_df$OBJECTID[j]
      x2 = homicide$x[j]
      y2= homicide$y[j]
      if(recno==recno2){
      }else {
      dist1 <-(sqrt((x2-x1)^2 + (y2-y1)^2))
         sumdist1 = sumdist1 + dist1
         }
    }
    if (sumdist1 < sumdist2){
           dist3<-list(recno, sumdist1, x1,y1)
           sumdist2 = sumdist1
           xdistmin <- x1
           ydistmin <- y1
         }
}

# plots
plot(Sbnd)
points(homicide$x, homicide$y)
points(xbarw,ybarw,col = "blue", cex = 1.5, pch = 19) # weighted mean center
points(dist3[[3]][1],dist3[[4]][1],col = "orange", cex = 1.5, pch = 19) # central point
lines(circle, col = 'red', lwd = 2)

Conclusions

Figure 6 shows three distinct point patterns:

Additional analysis could include quantifying the relationships between these three point patterns as well as expanding beyond just these three selected crimes.

Important to note is the social context surrounding this data. In 2014, nearby Ferguson, MO saw large scale protests following the police killing of 18-year old Michael Brown. Crime data analysis should be conducted with this in mind and ethical considerations should be taken to, for example, police all parts of the community equally. Even the reported begging incidents, for example, were clustered in high-traffic, relatively affluent, and touristy areas; whereas there were only a handful of begging incidents in the more impoverished neighborhoods.

Sources