library(tinytex)
Using police crime records from 2018-05 to 2019-05, can we produce a spatial map of crime type distributions across Greater London and cluster these to gain further insight on how crime evolves over time? Furthermore, will this help us distribute the correct type of resources to tackle hot points of specific crime incidences? Moving on, is there a correlation between the drug related incidences and the incidences of anti-social behaviour across London? Finally, this study will investigate these questions by utilising data science visualisation and machine learning to perform cluster analysis. The data source is: https://data.police.uk/data/
Crime data provided by Police Forces in London over a 12 month period will be used to produce multiple visualisations of crime type distributions over space and time. Machine Learning techniques, more specifically cluster analysis will then be used on the data to derive the crime type hotspots across London which should aide police resource allocation.
The government decided to investigate new techniques that can be deployed in policing to make up for the lack of resources. Furthermore, quantitaive policing has emerged with the sole intention to adopt Artificial Intelligence (AI) methods and apply them to conventional policing ultimately increasing the UK Police Force’s capabilities. In this study, modern Data Science and Artificial Intelligence techniques will be used to understand how crime evolves overtime in London on street level. Adding to that, what are the correleations between different crime types? Finally, how can resource allocation be adapted for the type of crime occuring at a hotspot so only resources that are equipped to deal with that specific crime type is distributed? Ultimately, this study should provide the Police Force with better resource allocation planning procedures. Final few points, the reason for why London is chosen as a case study, is because the population amount is greater than any other city (meaning more data) and there are more diverse crime types across Greater London (meaning more clusters). On that note, This method is reproducible with any type of crime data for any region across the world.
The study will follow the plan:
We have 24 seperate .csv files, we need to merge all of these into one csv. A small amount of computer code was created to read all the names of the seperate .csv files for each area of each force then merge all of these into one. The data was then imported into a dataframe and saved creating one csv with all the data to easily re-import and use.
library(webshot)
library(sp)
library(rgdal)
## rgdal: version: 1.4-4, (SVN revision 833)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: FALSE
## Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
## Linking to sp version: 1.3-1
library(ggplot2)
library(rgdal)
library(leaflet)
library(RColorBrewer)
library(colorRamps)
library(maps)
library(mapproj)
library(tigris)
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
##
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
##
## plot
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rgdal)
library(formattable)
The first 6 rows of the dataframe: The data columns of interest are: month of incidence occurance, longitude and latitude coordinates of incidence, LSOA code and name to identify any specific incidences and finally the crime type for cluster analysis.
head(df)
The column names are changed to be more informative.
For the LSOA name column, each row has four characters on the end of each borough made up of strings, this could cause problems in later analysis so they are removed by trimming the last four characters.
df$`LSOA name` <- gsub('.{4}$', '', df$`LSOA name`)
There are 1177958 rows in total. This means, 1,177,958 reported crimes across Greater London from 05-2018 to 05-2019.
## [1] 1177958
df_unique <- unique(df$`Crime type`)
print(df_unique)
## [1] "Anti-social behaviour" "Burglary"
## [3] "Public order" "Bicycle theft"
## [5] "Drugs" "Other theft"
## [7] "Theft from the person" "Vehicle crime"
## [9] "Violence and sexual offences" "Other crime"
## [11] "Criminal damage and arson" "Possession of weapons"
## [13] "Robbery" "Shoplifting"
df_two <- unique(df$`LSOA name`)
print(df_two)
## [1] "Camden " "City of London "
## [3] "Islington " "Southwark "
## [5] "Tower Hamlets " "Westminster "
## [7] "Hackney " "Waltham Forest "
## [9] "Lambeth " "Newham "
## [11] "Haringey " "Redbridge "
## [13] "Barking and Dagenham " "Barnet "
## [15] "Bexley " "Brent "
## [17] "Brentwood " "Bromley "
## [19] "Croydon " "Dartford "
## [21] "Ealing " "Elmbridge "
## [23] "Enfield " "Epping Forest "
## [25] "Epsom and Ewell " "Greenwich "
## [27] "Hammersmith and Fulham " "Harrow "
## [29] "Havering " "Hertsmere "
## [31] "Hillingdon " "Hounslow "
## [33] "Kensington and Chelsea " "Kingston upon Thames "
## [35] "Lewisham " "Merton "
## [37] "Reigate and Banstead " "Richmond upon Thames "
## [39] "Runnymede " "Sevenoaks "
## [41] "South Bucks " "Spelthorne "
## [43] "Sutton " "Tandridge "
## [45] "Three Rivers " "Thurrock "
## [47] "Wandsworth " "Watford "
## [49] "Welwyn Hatfield " "Broxbourne "
## [51] "Mole Valley " "Slough "
## [53] "Woking " "Gravesham "
## [55] "Tonbridge and Malling " "Guildford "
The R package leaflet is used here to overlay both the data and an interactive map. to further explore our data and gain insights. Leaflet is an open-source javascript library that can be used in R to create mobile-friendly interactive maps.
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/sedarolmez/Documents/data_analytics_police_project/London-wards-2018_ESRI/London_Ward_CityMerged.shp", layer: "London_Ward_CityMerged"
## with 633 features
## It has 6 fields
## Warning in `proj4string<-`(`*tmp*`, value = new("CRS", projargs = "+init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894")): A new CRS was assigned to an object with an existing CRS:
## +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
## without reprojecting.
## For reprojection, use function spTransform
## Regions defined for each Polygons
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
This interactive map allows you to navigate the spatial crime data at street level. Each cluster reflects incidences that occured within the polygon at that location. It is clear from this visualisation that the proportion of crime is higher in the City of Westminster. Furthermore, The map is interactive and you can click on clusters to decouple markers. Next, a scatter plot of the data will be created with a label for each crime type to analyse where each type of crime is occuring. Finally, cluster analysis of the data will be implemented to see which type of incident occurs more frequently in various locations across the City of London.
Clustering longitudinal data to identify crime hotspots.
## Warning: Ignoring unknown parameters: legend
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
| Type of crime | Cluster |
|---|---|
| Theft from the person | 1 |
| Robbery | 2 |
| Other theft | 3 |
| Bicycle theft | 4 |
| Criminal damage and arson | 5 |
| Other crime | 6 |
| Anti-social behaviour | 7 |
| Burglary | 8 |
| Violence and sexual offences | 9 |
| Possession of weapons | 10 |
| Public order | 11 |
| Shoplifting | 12 |
| Drugs | 13 |
| Vehicle crime | 14 |
(how I will take my analysis further. Recommend for government/councils)