Predicting Geolocation of Next Attack

Data Challenge Test

Intro

In 1981 Peter Sutcliffe was convicted of thirteen murders and subjecting a number of other people to vicious attacks. One of the methods used to narrow the search for Mr. Sutcliffe was to find a “center of mass” of the locations of the attacks. In the end, the suspect happened to live in the same town predicted by this technique. Since that time, a number of more sophisticated techniques have been developed to determine the “geographical profile” of a suspected serial criminal based on the locations of the crimes.

Your team has been asked by a local police agency to develop a method to aid in their investigations of serial criminals. The approach that you develop should make use of at least two different schemes to generate a geographical profile. You should develop a technique to combine the results of the different schemes and generate a useful prediction for law enforcement officers. The prediction should provide some kind of estimate or guidance about possible locations of the next crime based on the time and locations of the past crime scenes. If you make use of any other evidence in your estimate, you must provide specific details about how you incorporate the extra information. Your method should also provide some kind of estimate about how reliable the estimate will be in a given situation, including appropriate warnings.

In addition to the required one-page summary, your report should include an additional two-page executive summary. The executive summary should provide a broad overview of the potential issues. It should provide an overview of your approach and describe situations when it is an appropriate tool and situations in which it is not an appropriate tool. The executive summary will be read by a chief of police and should include technical details appropriate to the intended audience.

Set Up and Load Data

#load packages
library(ggplot2)
library(spatstat)
library(maptools)
library(ggmap)
library(scales)

The data set on Peter Sutcliffe was obtained from a public source.There were 22 reported victims starting from July 1975 to November 1980. The data also includes the coordinates where the victim was assaulted.

#set directory and read in data
setwd("C:/Users/Anthony/Documents/Projects/Neene")
df <- read.csv("yorkshire.csv")

#explore data
head(df)

##   No. Year  Date    first      last       x       y
## 1   1 1975  7.50     Anna Rogulskyj -1.9097 53.8656
## 2   2 1975  8.15    Olive     Smelt -1.8688 53.7401
## 3   3 1975  8.27    Tracy    Browne -1.9378 53.9142
## 4   4 1975 10.30    Wilma    McCann -1.5440 53.8142
## 5   5 1976  1.20    Emily   Jackson -1.5329 53.8079
## 6   6 1976  5.90 Marcella   Claxton -1.5045 53.8380

#viz
ggplot(df, aes(x,y)) + geom_point()

Compute Density Function

Using the R package Spatial Point Pattern Analysis, a density function is applied onto the data.

#Kernel Smoothed Intensity of Point Pattern
train <- df[1:21,]
test <- df[22,]
pp = ppp(train$x,train$y,c(-2.5,-1.25), c(53.25,54))
dp <- density.ppp(pp, sigma = 0.1)
plot(dp)
title(main="Density Function")

Map Out Coordinates on West Yorkshire

#data frame of density function
density <- data.frame(dp)

#map of West Yorkshire
myLocation <- c(-2.5,53.25,-1.25,54)
myMap <- get_map(location=myLocation,
                 source="google",
                 maptype="terrain",
                 color="bw", crop= T)

#layer in density function and points
ggmap(myMap, legend="topright") +
        geom_tile(data = density,
                  aes(x, y, fill = value),
                  alpha = 0.5, 
                  color = NA) +
        ggtitle("Map Based on First 21 Victims") +
        scale_fill_distiller(palette = "YlOrRd",
                             breaks = pretty_breaks(n = 10), 
                             direction = 1) +
        geom_point(data=train, aes(x,y)) +
        geom_point(data=test, aes(x,y),
                   shape="X", size = 8,
                   alpha=1, color="#0000FF") +
        annotate("text", x=-1.5762, y=53.78,
                 label = "22nd Victim",
                 color="#0000FF",
                 size = 3)

Test Model On 22nd Victim

Testing the model on the coordinates of the 22nd victim predicts a value of 139 which is very high based on the relative density of the other locations. The model does a good job on predicting the location of the next victim.

test_x <- test$x
test_y <- test$y

pred <- subset(density,
                        round(density$x,2) == round(test_x, 2) &
                        round(density$y,2) == round(test_y, 2))
mean(pred$value)

## [1] 138.7961

Executive Summary

A model was built based on the coordinate location of the first 21 victims. Using this model, specific areas were determined to be of lower or higher value representing the likelihood of another attack. Testing this model with the geolocation of the 22nd victim resulted in a very high value. The model mapped out an area of high interest and that is where the next victim was attacked.

This model could have been used to focus the search efforts in high risk areas and prevented the next attack from occurring.