Research question

Using police crime records from 2018-05 to 2019-05, can we produce a spatial map of crime type distributions across Greater London and cluster these to gain further insight on how crime evolves over space? Furthermore, will this help us distribute the correct type of resources to tackle hot points of specific crime incidences? Moving on, is there a correlation between the drug related incidences and the incidences of anti-social behaviour across London? Finally, this study will investigate these questions by utilising data science visualisation and machine learning to perform cluster analysis. The data source is: https://data.police.uk/data/

Table of Contents

  1. Executive Summary
  2. Introduction
  3. Data
  4. Background
  5. Methods and Results
  6. Conclusion
  7. Recommendations
  8. References

Executive Summary

Crime data provided by Police Forces in London over a 12 month period will be used to produce multiple visualisations of crime type distributions over space and time. Machine Learning techniques, more specifically cluster analysis will then be used on the data to derive the crime type hotspots across London which should aide police resource allocation.

Introduction

The government decided to investigate new techniques that can be deployed in policing to make up for the lack of resources. Furthermore, quantitaive policing has emerged with the sole intention to adopt Artificial Intelligence (AI) methods and apply them to conventional policing ultimately increasing the UK Police Force’s capabilities. In this study, modern Data Science and Artificial Intelligence techniques will be used to understand how crime evolves overtime in London on street level. Adding to that, how can the relationships between different crime types be explored? Finally, how can resource allocation be adapted for the type of crime occuring at a hotspot so only resources that are equipped to deal with that specific crime type is distributed? Ultimately, this study should provide the Police Force with better resource allocation planning procedures. Final few points, the reason for why London is chosen as a case study, is because the population amount is greater than any other city (meaning more data) and there are more diverse crime types across Greater London (meaning more clusters). On that note, This method is reproducible with any type of crime data for any region across the world.

The study will follow the plan:

  • Download two datasets, one for the City of London and the other for Metropolitan Police Service. This is because, both forces operate in London.
  • As Metropolitan Police service operates across the country, we must first merge the two datasets then filter out the incidences which occur in coordinates that are outside the boundary lines of London.
  • Clean the datasets of crime records and aggregate these to street level with coordinates.
  • Pin each crime location with a label of the type of crime across London into an interactive map. Only a subset of the data can be plotted interactively as this requires a large amount of computation.
  • Create multiple visualisations of the data to gain insight.
  • Use cluster analysis on the maps to try understand what the probabilities of various crimes occuring in certains areas. And how these can impact neighbouring wards.

Data

The data and methods used in this study focus on research conducted in (Malleson and Andresen 2016). Adding to that, the data must first be cleaned and preprocessed to be utilised effectively. This chapter will demonstrate the steps taken to insert the data into the project and clean it for use in later chapters.

Data wrangling

There are 24 seperate .csv files, these all need to be merged into one csv file. Furthermore, a small amount of computer code was created to read all the names of the seperate csv files for each subfolder, these were all then merged. Finally, the data was then imported into a dataframe and saved creating one csv with all the data to easily re-import and use. This step was necessary as both Metropolitan and City of London police forces operate in London.

The data has been merged into a single file. Next, the first 6 rows of the dataframe are presented: The data columns of interest are: month of incidence occurance, longitude and latitude coordinates of incidence, LSOA code and name to identify any specific incidences and finally the crime type for cluster analysis.

##     Month Longitude Latitude LSOA.code           LSOA.name
## 1 2018-05 -0.114954 51.51863 E01000914         Camden 028B
## 2 2018-05 -0.111497 51.51823 E01000914         Camden 028B
## 3 2018-05 -0.111497 51.51823 E01000914         Camden 028B
## 4 2018-05 -0.097409 51.52114 E01000001 City of London 001A
## 5 2018-05 -0.097601 51.52070 E01000001 City of London 001A
## 6 2018-05 -0.097601 51.52070 E01000001 City of London 001A
##              Crime.type
## 1 Anti-social behaviour
## 2              Burglary
## 3          Public order
## 4 Anti-social behaviour
## 5 Anti-social behaviour
## 6 Anti-social behaviour

The column names are changed to be more informative i.e. “LSOA.code” is changed to “LSOA code” and so on.

For the LSOA name column, each row has four characters on the end of each ward made up of three numbers and one letter, this could cause problems in later analysis so they are removed by trimming the last four characters.

It is important to know the scale of data being used, this can demonstrate difficulties in processing as the computer used may not be as powerful enough. In this study, there are 1,177,958 rows in total. This means, 1,177,958 reported crimes across Greater London from 05-2018 to 05-2019.

## [1] 1177958

Background

This chapter describes the processes undertaken that lead up to the research questions and reasons for the study being conducted. The mental health crisis across the UK has put a strain on emergency services (more specifically Ambulance and Police) (Dustmann and Fasani 2016) over time. Moreover, conventional resources that were distributed to specifically tackle issues in crime hotspots were now being redirected to situations in which the resource was not capable of tackling the incident. For example, Community Support Officers being called to situations where the perpetrator is suffering from mental illness. In conclusion, this study focuses on how modern AI and Data Science techniques can be utilised to make the Police Forces in London more effective with less resources. It is also important, to better understand the correlation of crime types i.e. if I take drugs (Crime: drug use) would this increase the chances of me then being anti-social (Crime: Anti-social behavior). In this study, it is argued that the probability of committing a crime once you commit a crime previously is high, and this can only be quantified if you know how much crime is occuring in a location and the type of crime that is occuring in that same location (Whitley and Prince 2005). Finally, there are several datasets in the public domain which relate to crime and law enforcement in general, however, the most populated to date is the https://data.police.uk/data/ data source. This is why it is endorsed in this study.

Moving on, the types of crimes recorded must be explored to better understand what category each incidence falls under. The following code output reflects this.

##  [1] "Anti-social behaviour"        "Burglary"                    
##  [3] "Public order"                 "Bicycle theft"               
##  [5] "Drugs"                        "Other theft"                 
##  [7] "Theft from the person"        "Vehicle crime"               
##  [9] "Violence and sexual offences" "Other crime"                 
## [11] "Criminal damage and arson"    "Possession of weapons"       
## [13] "Robbery"                      "Shoplifting"

Now, let us see if the LSOA name column no longer has the four characters on the end of each Ward.

##  [1] "Camden "                 "City of London "        
##  [3] "Islington "              "Southwark "             
##  [5] "Tower Hamlets "          "Westminster "           
##  [7] "Hackney "                "Waltham Forest "        
##  [9] "Lambeth "                "Newham "                
## [11] "Haringey "               "Redbridge "             
## [13] "Barking and Dagenham "   "Barnet "                
## [15] "Bexley "                 "Brent "                 
## [17] "Brentwood "              "Bromley "               
## [19] "Croydon "                "Dartford "              
## [21] "Ealing "                 "Elmbridge "             
## [23] "Enfield "                "Epping Forest "         
## [25] "Epsom and Ewell "        "Greenwich "             
## [27] "Hammersmith and Fulham " "Harrow "                
## [29] "Havering "               "Hertsmere "             
## [31] "Hillingdon "             "Hounslow "              
## [33] "Kensington and Chelsea " "Kingston upon Thames "  
## [35] "Lewisham "               "Merton "                
## [37] "Reigate and Banstead "   "Richmond upon Thames "  
## [39] "Runnymede "              "Sevenoaks "             
## [41] "South Bucks "            "Spelthorne "            
## [43] "Sutton "                 "Tandridge "             
## [45] "Three Rivers "           "Thurrock "              
## [47] "Wandsworth "             "Watford "               
## [49] "Welwyn Hatfield "        "Broxbourne "            
## [51] "Mole Valley "            "Slough "                
## [53] "Woking "                 "Gravesham "             
## [55] "Tonbridge and Malling "  "Guildford "

It is important to focus the question “is there a correlation between the drug related incidences and the incidences of anti-social behaviour across London?” on a subset of this data as there are many wards and types of crime, this is why first we need to identify which ward has the most crime and then analyse if there is a relationship between incidences of drug use and Anti-social behaviour. This is because we want to focus on a ward which has the most incidences of crime as this means more data.

Methods and Results

Some of the methods used in this study are: * Data preprocessing * Data visualisation using R packages like ggplot2 and leaflet. * Machine Learning, more specifically cluster analysis to identify crime hotspots (KMeans).

The results are presented below after the use of each method.

Interactive map using Leaflet

The R package leaflet is used here to overlay both the data and an interactive map. To further explore our data and gain insights. Leaflet is an open-source javascript library that can be used in R to create mobile-friendly interactive maps. As the plotting of over a million datapoints on an interactive map is computationally expensive, we decided to choose a random but even distribution of incidences across London.

The London_Ward_CityMerged.shp base map was downloaded from https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london and used to plot the crime report data. This was necessary because, without a boundary map, the graphs created would not be very informative especially when the data is geospatial.

The following map is just a depiction of the shape file used in this study for plotting and clustering.

Moving on, the following interactive map is used to navigate the wards of Greater London. Adding to this, the map is accessible on the website: http://rpubs.com/SedarOlmez94/Sheffield_Module and should be used to browse the spatial characteristics of which the crime data is based on.

The next interactive map allows you to navigate the spatial crime data at street level. Each cluster reflects incidences that occured within the polygon within that cluster. It is clear from this visualisation that the proportion of crime is higher in the City of Westminster. Meaning, this can be a popular diverse crime hotspot and should be further analysed. Moving on, this map reinforces the points made in (Sutherland, Brunton-Smith, and Jackson 2013). Moreover, The map is hosted on the following link: http://rpubs.com/SedarOlmez94/Sheffield_Module Next up, scatter plots of the data will be created with a label for each crime type to investigate where each type of crime is occuring. Finally, more advanced cluster analysis of the data will be attempted to see which type of incident occurs more frequently in various locations across Greater London.

The next visualisation is not very informative analytically speaking, however, it is the initial phase of plotting crime type distributions across Greater London. The following graph, contains the entire 1.1 million data points of incidences of crime, the data is clearly closely coupled, so a different type of visiualisation, bar graphs perhaps could be used here.

Before the data is visualised using bar graphs, you should take a look at some important queries. The first query plots the number of drug related incidences in Greater London in January 2019 (January is chosen randomly, because, it doesn’t really matter what month you pick, because the hypothesis is to see if drug related incidences have a relationship with incidences of anti-social behaviour). It can be see in this graph that central london is a hub of drug use and the number of drug use incidences decreases as you move out of the central area.

In the following graph, something very interesting can be see. Incidences of anti-social behaviour looks equally dispersed outside of the City of London as it is in the central area. It is possible that the root cause of a lot of these anti-social incidences are related to drug use (Stafford, Chandola, and Marmot 2007).

In the following bar chart, you can see that the crimes committed the most are violance and sexual offences, anti-social behaviour, vehicle crime and other theft. It can be argued here that drug related incidences are not related to anti-social behaviour. Because, the number of incidences are not barely close. However, a counter argument is that the use of drugs in the City of London can be related to anti-social behaviour, but, as you leave the city and enter more urban, poverty driven space, then the causes of anti-social behaviour is not based on the use of drugs because there can be more factors involved, like poverty, lack of opportunity and so on.

In this next graph, you can see the number of incidences that occured in each ward. As mentioned before, the number of incidences are much higher in the City of Westminster, this could be because it is the most popular tourist destination, it houses the most expensive stores and the class that resides in this location is the upper class, so that could make it a crime hotspot as more capital can be gained from comitting an offence.

Further utilising the use of bar graphs, you may want to see the number of incidences of drug related crimes and incidences of anti-social behaviour over a one year period in Westminster. Moreover, the following graph clearly shows that the proportion of drug related crimes is much smaller than anti-social behaviour, however, if there is an increase in one, theres also an increase in the other. The date in which most occurances of both incidences is 08-2018.

The following graph uses machine learning/cluster analysis to first try every clustering algorithm within a loop then pick the most suitable one for the data we have. 14 clusters were used one for each crime type, and now each cluster indicates the type of crime with the highest probability of occurence in that area. The algorithm chosen for this task was KMeans.

The following table presents the type of crimes and the cluster they belong to. This table is calculated using the proportion of the type of crime occuring in a cluster and the probability of a crime in a neighbouring cluster being of the same type.

##                   Type of crime Cluster
## 1                   Shoplifting       1
## 2         Possession of weapons       2
## 3         Anti-social behaviour       3
## 4                   Other crime       4
## 5                 Vehicle crime       5
## 6                      Burglary       6
## 7         Theft from the person       7
## 8                 Bicycle theft       8
## 9     Criminal damage and arson       9
## 10                        Drugs      10
## 11                      Robbery      11
## 12                 Public order      12
## 13 Violence and sexual offences      13
## 14                  Other theft      14

Conclusion

This project was very exciting as it contained technical skills and sociological analysis. Getting the data was not difficult, however, merging it all together automatically and then having to access only a subset for the interactive graphs was a shortcoming. The study set out to answer the following questions: * Using police crime records from 2018-05 to 2019-05, can we produce a spatial map of crime type distributions across Greater London and cluster these to gain further insight on how crime evolves over space? The last bar graph and cluster analysis answers this question. * will this help us distribute the correct type of resources to tackle hot points of specific crime incidences? The final graph tells you exactly what the likeliness of a certain type of incident occuring in a location given the data is, therefore, planning procedures can be put in place in which the City of London police hires and distributes officers that are more equipped say to deal with drug related incidences to hotspots of drug related offences. * is there a correlation between the drug related incidences and the incidences of anti-social behaviour across London? This is a question I couldn’t answer as there are many factors for why they may be related or may not. There is no evidence in this study to say drug related incidences can later be the cause of anti-social behaviour. However, in the cluster analysis, both anti-social behaviour and drug related clusters are always neighbouring so this can be a sign that there are similarities in the data.

Recommendations

In the future, it is vital all questions are answered, and that further analysis is done with the data to try better understand if crime types can be related. The next stage in the research can be to design an agent based model which depicts a scenario of resource allocation where the data used in this study is utilised and better resource allocation models are derived.

References

Dustmann, Christian, and Francesco Fasani. 2016. “The Effect of Local Area Crime on Mental Health.” Economic Journal. https://doi.org/10.1111/ecoj.12205.

Malleson, Nick, and Martin A. Andresen. 2016. “Exploring the impact of ambient population measures on London crime hotspots.” Journal of Criminal Justice. https://doi.org/10.1016/j.jcrimjus.2016.03.002.

Stafford, Mai, Tarani Chandola, and Michael Marmot. 2007. “Association between fear of crime and mental health and physical functioning.” American Journal of Public Health. https://doi.org/10.2105/AJPH.2006.097154.

Sutherland, Alex, Ian Brunton-Smith, and Jonathan Jackson. 2013. “Collective efficacy, deprivation and violence in London.” British Journal of Criminology. https://doi.org/10.1093/bjc/azt050.

Whitley, Rob, and Martin Prince. 2005. “Fear of crime, mobility and mental health in inner-city London, UK.” Social Science and Medicine. https://doi.org/10.1016/j.socscimed.2005.03.044.