Crime rates in Sheffield City Centre

October - December 2021


This report combines open data from Companies House API with UK Police API street-level crime statistics. The initial scope is focussed on companies registered in the S1 area of Sheffield city centre together with crimes recorded during Q4 2020, October to December.

Geocoding data is combined with both Company and Crime data to provide insight into crime and company locations across the city centre detailing type of crime and indicating crime levels.

The uses of crime data are widespread and include operational and resource allocation decisions by law enforcement, local councils, government agencies, businesses and other groups including Universities. Universities are being actively encouraged to collect and publish crime statistics to help inform student choice and improve student welfare.

Future developments could include further analysis of business sectors, expanding coverage to other UK areas, automating the UK Police Data API and expanding the time period covered and improving user interaction and functionality with dynamic geospatial location services.


Introduction

API's are Application Programming Interfaces, and we use them every day in apps like Facebook, maps and weather apps. API's allow users to request data and this is returned via a response in XML or now more commonly JSON formats. JSON (JavaScript Object Notation) requires less coding and is smaller and faster due to the popularity of RESTful API's. REST is Representational State Transfer which is an architectural style set of rules which ensure that the user of the API can request data when you link to a specific url, this is called a request and the data is received back as a response.

There are 4 main types of API, open, partner, internal and composite. For this project only open, or public, API's were researched.

Hundreds of open API's are accessible and the following selection were investigated: OpenWeather API, Companies House Public Data API, OpenStreetMap, ONS Data Hub, Google Cloud API, DEFRA and UK Police Data.

Unfortunately, the UK Data Service Open Data API was unavailable and without a student account from the OpenWeather API, it was difficult to retrieve enough data for the report, also the data was limited to one set of co-ordinates per city and although some steps had been take towards data transformation and simple metrics had already been developed it was decided that this data would not suit the geocoding planned for this report.

Considerable thought was then given to the potential onward use of the data available, whilst researching API’s the UK Police Data also had latitude and longitude and the aim was to combine this with the data from Companies House data by latitutde and longitude. The data with the best fit for geocoding was the company level data from the company house API together with Yorkshire and Humber postcode data from doogal.

To ensure that the GDPR principles were followed, no special category data was required, and the minimum and only relevant data required for this report was requested.


Data flow

Companies House API

Companies House API is a REST API, in practice it was very difficult to attain a large quantity of specific data with one simple API request as the data required was a specific subset of Sheffield companies n S1. Companies House offer a Free Company Data Product split into 6 large zip files. All the bulk csv files were downloaded and unzipped and individually searched for companies with a postcode area of S1. A list of 1,200 company numbers were saved to csv and used as the basis for the API request.

Requests were made using the Company profile get 'https://api.company-information.service.gov.uk/company/[company_number]' Details of the data schema is shown below in JSON format.

{ "company_name" : "string", "company_number" : "string", "company_status" : "string", "date_of_creation" : "date", "company_type" : "string", }

Properties of the data set are also detailed, company_status contains 8 categories; active, dissolved, liquidation, receivership, administration, voluntary-arrangement, converted-closed and insolvency-proceedings. This variable is used to filter the data and so it is important to be familiar with all the data categories available.

The data response is in UTF-8, Unicode Transformation Format - 8 bits, it is an encoding system for Unicode, this is the most common on the internet, over 96% and is required by JSON to decode the response data. The data also has to be flattened as it comes through as nested data layers.

In order to receive data for so many companies a loop was needed to request data for each company number and append it to a main dataset. The request had to be split into several segments due to many issues with combining the data in the loop. One of the limitations was that Companies House will only allow 600 requests within a 5 minute period, so the code was delayed part way through by 301 seconds.

Due to the many issues with this loop of code and because of the time delay this code has been cached and does not run live in this report, the output data have been saved to csv and are read back in at the start of the next chunk of code.

The data sets are appended, only the required variables are selected and filtered for active companies in the S1 area and the raw data consists of company name, number, status and type, date of creation, address data including postcode and sic codes. The data is written to API_Raw_Data.csv.

Further transformation involved renaming several variables, extracting substrings from postcode, calculating company age and constructing an age grouping and label, filtering out instances where company records are duplicated due to previous company names. Leaving a dataset of just under 500 distinct companies, with a company status of active and registered in the S1 postcode area.


Geocoding

Geocoding translates address data i.e. postcodes, into latitudinal and longitudinal co-ordinates that can then be depicted on an interactive map. Postcode data for Yorkshire and Humber was downloaded in csv format from doogal.

The csv format required some transformation as the last 0 on the right of the number was removed, requiring the latitude and longitude to be padded back up to 9 characters. Outward area and sector were extracted from the postcode and the data filtered to Sheffield S1 only, two concatenated area fields were created for later aggregation and labelling, and all latitude and longitude fields set to numeric.

An extra level of postcode at the S1 1A* level was created, the maximum and minimum latitude and longitude for each level were calculated giving a range, together with the corresponding mid points.

The data was then filtered down to 488 ‘in use’ Sheffield postcodes with an outward code S1. The postcode area S1 contains 66 postcode districts with 5 postcode sectors each with approximately 3000 addresses in each sector and 488 active Unit Postcodes. ONS Postal geography.

MLSOA - middle layer super output areas and LLSOA - lower layer super output areas have been included in the data for context and aggregation. These output areas were designed to improve the reporting of small area statistics and are built up from groups of output areas. Statistics for both lower and middle layer super output areas were originally released in 2004 for England and Wales and were reviewed for the 2011 Census. An average LLSOA contains between 1,000 to 3,000 people and an MLSOA between 5,000 and 15,000.

Every LLSOA has an IMD, this Index of Multiple Deprivation is the official measure of relative deprivation experienced by people living in an area. The indices relatively rank each area or LLSOA from 1 to 32,844 across England, 1 being the most deprived and 32,844 the least deprived. IMD includes many weighted measures, including Income Deprivation (22.5%), Employment Deprivation (22.5%), Education, Skills and Training Deprivation (13.5%), Health Deprivation and Disability (13.5%), Crime (9.3%), Barriers to Housing and Services (9.3%), Living Environment Deprivation (9.3%). A more detailed explanation of census geography can be found here English Indices of Deprivation.

Further investigation into obtaining postcode polygon data would increase the accuracy of locating any co-ordinate within a given postcode.

The postcode data was then joined to the company data and any companies with missing latitudes were filtered out and latitude and longitude set to numeric.

The data was written to API_Clean_Data_company.csv.

UK Police API

UK Police crime data at street-level was obtained from the UK Police Data API for the three latest months available, October to December 2021 using the co-ordinates for Sheffield, latitude = 53.3811 & longitude= -1.4701. The response was in the same format as the Companies House API data and so did not require any new transformation other than re-formatting the latitude and longitude to numeric and a count variable was added.

Separate requests were made for each month and then combined into one dataset. Future development would be to calculate the latest available month, about 3-4 weeks after current months end, and to construct a list of months with a loop to request the crime for the latest n months. It would also be interesting to analyse crime statistics from 2019 compared to 2020 through the lens of the pandemic.

The API data is already anonymised, and the street-level crime data latitude and longitude locations represent the approximate location of a crime only, also the data is aggregated up to month level.

The requested crime data was then indexed and combined with the postcode data. By specifying Sheffield co-ordinates in the request, the data output was from approximately a one-mile radius and therefore contained some records outside of the set S1 parameters. The co-ordinates of each offence were calculated to be either in or out of the specified S1 range set between latitude 53.37154 and 53.38618 and longitude -1.460491 and -1.481137, those outside this range were filtered out.

Further transformation was required for this data to be joined with the postcode data as the latitude and longitude from this data could not be directly matched to the latitude and longitude in the postcode data. The datasets were joined by a cartesian join so every crime was matched with every postcode. The difference in the crime and postcode latitudes was added to the difference in longitudes to give a total difference, the data was then filtered to only keep the crime records with the least total difference and hence the closest postcode match.

The data was written to API_Clean_Data_crime.csv.


Crime KPI’s

As shown below in the Q4 Crime Categories and Q4 Crime Categories by Month, the overall crime levels for Q4 show that anti-social-behaviour, violent-crime and public-order-offences are by far the most frequent. When broken down by month there is an overall decrease in the number of offences in November and December compared with October, this variation could be attributed to lockdown in the city centre during that period, further data would be required to fully analyse these trends. During December, anti-social-behaviour and violent-crime decreased whilst bicycle-theft and theft-from-person increased dramatically.

The two KPI’s above show overall crimes recorded by quarter and could be expanded to a 12-month period to analyse trends. These plots together with the geographical and proximity context from the map could be used to monitor overall crime levels and more importantly specific types of crime. This would allow a more holistic approach to crime prevention allowing decisions to be made regarding improving streetlighting, deploying city centre ambassadors, creating youth schemes, and developing areas to encourage new business.

The map below details the types and frequency of crime recorded by latitude and longitude, each postcode has been rated a low, medium, or high crime area. With low <=5 and high >=40.


Finally, company data was appended to crime data at postcode level, this stacked data allows both data sets to be mapped together. The crime rating by postcode was added for all crimes and where applicable for companies.

The data was written to API_Clean_Data.csv.

Due to the project parameters for S1, some LLSOA areas have been bisected and so unfortunately the number of postcodes within each area is no longer balanced.

Of the 488 postcodes in S1, when broken down into each MLSOA and LLSOA area, by far the biggest area is Cathedral & Kelham - Sheffield 073D with an IMD of 17388, ranked at 50% least deprived containing 129 postcodes when compared with Devonshire Quarter - Sheffield 074A with an IMD of 5388, 20% most deprived and only 38 postcodes in S1. Therefore, the data could not be analysed at this level of aggregation.

By analysing the top 20 highest crime rates for Q4, crime rates are overall higher in the less deprived areas, this again could be due to the size of the areas and the higher daytime populations. There are a few drawbacks making direct comparisons between postcodes and LLSOA's as they do not have equivalent populations, it is not possible to directly compare Sheffield 073D with a daytime population of 17,209 with Sheffield 042A with a daytime population of 2,588. Future developments could incorporate additional data on LLSOA’s and postcodes.

The top 20 highest number of companies registered in 2020 by postcode indicates that company growth during 2020 also favoured areas of least deprivation. The usefulness of this data is limited as the data provided by Companies House details the registered address of the company and many not indicate an actual presence at that address. Also, the data does not include all business premises and so does not give an accurate view of business active in the city centre.

Distribution of Postcodes within S1
Area IMD IMD Category Postcodes
Cathedral & Kelham - Sheffield 073A 17635 50% least deprived 39
Cathedral & Kelham - Sheffield 073B 17706 50% least deprived 39
Cathedral & Kelham - Sheffield 073D 17388 50% least deprived 129
Devonshire Quarter - Sheffield 074A 5388 20% most deprived 38
Devonshire Quarter - Sheffield 074C 10947 40% most deprived 69

Data at the postcode level is informative it is not as intuitive as seeing the data on an interactive map. It is now possible to view company locations with level of crime in Sheffield city centre. The number of crimes is proportional to the radius of the circle. The MLSOA, postcode, IMD, number of companies/crimes and the crime rating for that postcode are visible in the popup label.

Areas of higher crime can be easily identified, and the distribution of companies can be directly compared alongside. Business specialising in security provision would be able to target high crime areas and co-ordinate resources and offer consolidated services, neighbourhood watch, security patrols, CCTV.


Conclusion

Open API data sources provide an immense resource of data that can allow companies to add value to internal data systems, improve marketing, target new customers, offer customers additional insight, and enhance company strategy.

There are many use cases where open data has created extra value to businesses, UK crime data would allow many businesses to monitor crime levels in their local areas. Businesses offering security services could identify potential companies or groups of companies existing in areas of high crimes rates, allowing them to offer consolidated services. Data could be integrated with internal company premises data and other metrics.

The Office for Students are encouraging Universities to make local crime statistics accessible to students. Crime could be monitored in and around campus and student accommodation locations improving risk management and student safety. Students would also be able to access this data to inform University and accommodation choices. Local councils can monitor crime levels and locations to improve city centre experience.

The data pipeline could be scaled up to reach wider locations and build a larger historical crime dataset. This level of insight could provide local businesses a means of monitoring local crime levels, and seasonal trends focusing on specific types of crime.

There are some limitations to the data, crime locations are approximate and are mapped loosely to the nearest postcode, and company data relates to registered addresses only. Categories of crime recorded could be further categorised into severity of crime.

Improvements could also be made to the UI for the maps, adding the ability to utilise the users location and expand the parameters of the mapping to the UK.