Sheffield module

Executive Summary
Introduction
Data
Background
Methods and Results
Conclusion
Recommendation

Executive Summary

Crime data provided by Police Forces in London over a 12 month period will be used to produce multiple visualisations of crime type distributions over space and time. Machine Learning techniques, more specifically cluster analysis will then be used on the data to derive the crime type hotspots across London which should aide police resource allocation.

Introduction

The government decided to investigate new techniques that can be deployed in policing to make up for the lack of resources. Furthermore, quantitaive policing has emerged with the sole intention to adopt Artificial Intelligence (AI) methods and apply them to conventional policing ultimately increasing the UK Police Force’s capabilities. In this study, modern Data Science and Artificial Intelligence techniques will be used to understand how crime evolves overtime in London on street level. Adding to that, what are the correleations between different crime types? Finally, how can resource allocation be adapted for the type of crime occuring at a hotspot so only resources that are equipped to deal with that specific crime type is distributed? Ultimately, this study should provide the Police Force with better resource allocation planning procedures. Final few points, the reason for why London is chosen as a case study, is because the population amount is greater than any other city (meaning more data) and there are more diverse crime types across Greater London (meaning more clusters). On that note, This method is reproducible with any type of crime data for any region across the world.

The study will follow the plan:

Download two datasets, one for the City of London and the other for Metropolitan Police Service. This is because, both forces operate in London.
As Metropolitan Police service operates across the country, we must first merge the two datasets then filter out the incidences which occur in coordinates that are outside the boundary lines of London.
Clean the datasets of crime records and aggregate these to street level with coordinates.
Pin each crime location with a label of the type of crime across London into an interactive map. Only a subset of the data can be plotted interactively as this requires a large amount of computation.
Create multiple visualisations of the data to gain insight.
Use cluster analysis on the maps to try understand what the probabilities of various crimes occuring in certains areas. And how these can impact neighbouring boroughs.

Data

Data wrangling

We have 24 seperate .csv files, we need to merge all of these into one csv. A small amount of computer code was created to read all the names of the seperate .csv files for each area of each force then merge all of these into one. The data was then imported into a dataframe and saved creating one csv with all the data to easily re-import and use.

library(webshot)
library(sp)
library(rgdal)

## rgdal: version: 1.4-4, (SVN revision 833)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-1

library(ggplot2)
library(rgdal)
library(leaflet)
library(RColorBrewer)
library(colorRamps)
library(maps)
library(mapproj)
library(tigris)

## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.

## 
## Attaching package: 'tigris'

## The following object is masked from 'package:graphics':
## 
##     plot

library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(rgdal)
library(formattable)

The first 6 rows of the dataframe: The data columns of interest are: month of incidence occurance, longitude and latitude coordinates of incidence, LSOA code and name to identify any specific incidences and finally the crime type for cluster analysis.

head(df)

The column names are changed to be more informative.

For the LSOA name column, each row has four characters on the end of each borough made up of strings, this could cause problems in later analysis so they are removed by trimming the last four characters.

df$`LSOA name` <- gsub('.{4}$', '', df$`LSOA name`)

There are 1177958 rows in total. This means, 1,177,958 reported crimes across Greater London from 05-2018 to 05-2019.

## [1] 1177958

Background

df_unique <- unique(df$`Crime type`)
print(df_unique)

##  [1] "Anti-social behaviour"        "Burglary"                    
##  [3] "Public order"                 "Bicycle theft"               
##  [5] "Drugs"                        "Other theft"                 
##  [7] "Theft from the person"        "Vehicle crime"               
##  [9] "Violence and sexual offences" "Other crime"                 
## [11] "Criminal damage and arson"    "Possession of weapons"       
## [13] "Robbery"                      "Shoplifting"

df_two <- unique(df$`LSOA name`)
print(df_two)

##  [1] "Camden "                 "City of London "        
##  [3] "Islington "              "Southwark "             
##  [5] "Tower Hamlets "          "Westminster "           
##  [7] "Hackney "                "Waltham Forest "        
##  [9] "Lambeth "                "Newham "                
## [11] "Haringey "               "Redbridge "             
## [13] "Barking and Dagenham "   "Barnet "                
## [15] "Bexley "                 "Brent "                 
## [17] "Brentwood "              "Bromley "               
## [19] "Croydon "                "Dartford "              
## [21] "Ealing "                 "Elmbridge "             
## [23] "Enfield "                "Epping Forest "         
## [25] "Epsom and Ewell "        "Greenwich "             
## [27] "Hammersmith and Fulham " "Harrow "                
## [29] "Havering "               "Hertsmere "             
## [31] "Hillingdon "             "Hounslow "              
## [33] "Kensington and Chelsea " "Kingston upon Thames "  
## [35] "Lewisham "               "Merton "                
## [37] "Reigate and Banstead "   "Richmond upon Thames "  
## [39] "Runnymede "              "Sevenoaks "             
## [41] "South Bucks "            "Spelthorne "            
## [43] "Sutton "                 "Tandridge "             
## [45] "Three Rivers "           "Thurrock "              
## [47] "Wandsworth "             "Watford "               
## [49] "Welwyn Hatfield "        "Broxbourne "            
## [51] "Mole Valley "            "Slough "                
## [53] "Woking "                 "Gravesham "             
## [55] "Tonbridge and Malling "  "Guildford "

Methods and Results

Interactive map using Leaflet

The R package leaflet is used here to overlay both the data and an interactive map. to further explore our data and gain insights. Leaflet is an open-source javascript library that can be used in R to create mobile-friendly interactive maps.

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/sedarolmez/Documents/data_analytics_police_project/London-wards-2018_ESRI/London_Ward_CityMerged.shp", layer: "London_Ward_CityMerged"
## with 633 features
## It has 6 fields

## Warning in `proj4string<-`(`*tmp*`, value = new("CRS", projargs = "+init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894")): A new CRS was assigned to an object with an existing CRS:
## +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
## without reprojecting.
## For reprojection, use function spTransform

## Regions defined for each Polygons

## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively

This interactive map allows you to navigate the spatial crime data at street level. Each cluster reflects incidences that occured within the polygon at that location. It is clear from this visualisation that the proportion of crime is higher in the City of Westminster. Furthermore, The map is interactive and you can click on clusters to decouple markers. Next, a scatter plot of the data will be created with a label for each crime type to analyse where each type of crime is occuring. Finally, cluster analysis of the data will be implemented to see which type of incident occurs more frequently in various locations across the City of London.

Clustering longitudinal data to identify crime hotspots.

## Warning: Ignoring unknown parameters: legend

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

Type of crime	Cluster
Theft from the person	1
Robbery	2
Other theft	3
Bicycle theft	4
Criminal damage and arson	5
Other crime	6
Anti-social behaviour	7
Burglary	8
Violence and sexual offences	9
Possession of weapons	10
Public order	11
Shoplifting	12
Drugs	13
Vehicle crime	14

Conclusion

Recommendations

(how I will take my analysis further. Recommend for government/councils)