Sheffield City Centre Companies and Crime rates

October - December 2021


This report aims to combine open data from companies based in the S1 area of Sheffield City centre with and geocodin crime statistics from October to December 2021. It allows insight into crime rates across the city centre identifying company locations and high crime levels. Risk mananegement security providers.

Future developments could include analysis business sectors thorugh sic codes, expanding coverage to other areas and cities,improve plots, functionality and interactivity.


Introduction

Initially many API's were investigated: OpenWeather API, Companies House API, OpenStreetMap, ONS, ONS Data Hub, Google Cloud API and UK Police Data.

Unfortunately, the UK Data Service Open Data API was unavailable and without a student account from the OpenWeather API, it was difficult to retrieve enough data for the report, also the data was limited to one set of co-ordinates per city and although some steps had been take towards data transformation and simple metrics had already been developed it was decided that this data would not suit the geocoding planned for this report.

Considerable thought was then given to the potential onward use of the data available, whilst researching API’s the UK Police Data also had latitude and longitude and the aim was to connect this with the data from Companies House data at the postcode level. The data with the best fit for geocoding was the company level data from the company house API together with Yorkshire and Humber postcode data from the ONS. To ensure that the GDPR principles were followed, no special category data was required, and the minimum and only relevant data required for this report was requested.


Data

Company data

In practice it was very difficult to attain a large quantity of specific data with one simple API request as the data was required for a subset of Sheffield. All the bulk csv files were downloaded and unzipped and individually searched for companies with a postcode area of S1. A list of 1,200 company numbers were saved to csv and used as the basis for the API request. The request had to be split into several segments due to many issues with combining the dta in the loop. Also Companies House will only allow 600 requests within a 5 minute period, so the code was delayed by 301 seconds.

There have been many issues with this loop of code and because of the time delay this code has been cached and does not run live in this report, the output data have been saved to csv and are read back in at the start of the next chunk of code.

Requests made using https://api.company-information.service.gov.uk/company/[company_number] and the data schema is shown below in JSON format. { "company_name" : "string", "company_number" : "string", "company_status" : "string", "date_of_creation" : "date", "company_type" : "string", },

The data sets are appended, only the required variables are selected and filtered for active companies in the S1 area and the raw data consists of company name, number, status and type, date of creation, address data including postcode and sic codes. The data is written to API_Raw_Data.csv.

Further transformation involved renaming several variables, extracting substrings from postcode, calculating company age and constructing an age grouping and label, filtering out instances where company records are duplicated due to previous company names. Leaving a dataset of just under 500 distinct companies, currently active and in the S1 postcode area.


Postcode data

Postcode data was then required to allow geocoding, data for Yorkshire and Humber was downloaded in csv format from doogal.

The csv format required some transformation as the last 0 on the right of the numberthe latitude and longitude to be padded back up to 9 characters. Outward area and sector were extracted from the postcode and the data filtered to Sheffield S1 only, two concatenated area fields were created for later aggregation and labelling, and all Latitude and Longitude fields set to numeric.

An extra level of postcode at the S1 1A* level was created, the maximum and minimum latitude and longitude for each level were calculated giving a range, together with the corresponding mid points.

The data was then filtered down to 489 ‘in use’ Sheffield postcodes with an outward code S1.

The postcode area S1 contains 66 postcode districts with 5 postcode sectors each with approximately 3000 addresses in each sector and 489 active Unit Postcodes. ONS Postal geography. MLSOA and LLSAO have been included in the data for context and aggregation.

These output areas were designed to improve the reporting of small area statistics and are built up from groups of output areas (OA). Statistics for lower layer super output areas (LLSOA) and middle layer super output areas (MLSOA) were originally released in 2004 for England and Wales and were reviewed for the 2011 Census. An average LLSOA contains between 1,000 to 3,000 people and an MLSOA between 5,000 and 15,000.

The IMD Index of Multiple Deprivation is the official measure ofrelative deprivation experienced by people living in an area, calculated for each LSOA. The indices relatively rank each area or LLSOA from 1 to 32,844 across England, 1 being the most deprived and 32,844 the least deprived. IMD includes many weighted measures, including Income Deprivation (22.5%), Employment Deprivation (22.5%), Education, Skills and Training Deprivation (13.5%), Health Deprivation and Disability (13.5%), Crime (9.3%), Barriers to Housing and Services (9.3%), Living Environment Deprivation (9.3%). A more detailed explanation of census geography can be found here English Indices of Deprivation

Further investigation into obtaining postcode polygon data would increase the accuracy of locating any co-ordinate within a given postcode.


Combining postdata with company data

The postcode data was then joined to the company data any companies with missing latitudes were filtered out and latitude and longitude set to numeric.

IMD has been plotted against the company age category to visualise whether new companies are favouring areas with low, medium or high IMD’s.

```


Crime data

UK Police crime data was obtained from the API for the three latest months available October to December 2021 using the co-ordinates for Sheffield latitude = 53.3811 & longitude= -1.4701. The response was in the same format as the Companies House API data and so did not require any new transformation other than the latitude and longitude needed re-formatting to numeric and a count variable was added.

Separate requests were made for each month and then combined into one dataset. Future development would be to calculate the latest available month, about 3 weeks after current months end, and to construct a list of months with a loop to request the crime for the latest n months. It would also be interesting to analyse crime statistics from 2019 compared to 2020.

The crime data was indexed and joined to the postcode data. By specifying Sheffield co-ordinates the data output was from approximately a one mile radius and therefore contained some records outside of S1 parameter. The co-ordinates of each offence were calculated to be either in or out of the specified S1 range set between latitude 53.37154 and 53.38618 and longitude -1.460491 and -1.481137, those outside this range were filtered out.

Further transformation was required for this data to be joined with the postcode data as the latitude and longitude from this data could not be directly matched to the lat/long for the postcodes. The datasets were joined by a cartesian join so every crime was matched with every postcode. The difference in the crime and postcode latitudes was added to the difference in longitudes to give a total difference, the data was then filtered to only keep the crime records with the least total difference and hence the closest postcode match.

Data was then summarized at various levels for the maps and plots.

Crimes by month show a decrease in the numbers of offences in November and December during lockdown in the city centre.


Now the crime data is geocoded and mapped, category of crime can be seen.

par(mar = c(4, 4, .1, .1))
plot(plot_stacked)
plot(plot_category)


Company and crime data

Company and crime data was combined in long data format i.e. stacked in order to be able to map both crimes and companies together.

Data vis

par(mar = c(4, 4, .1, .1))
plot(plot_IMD)
plot(plot_MLSOA)

Now it is possible to view company locations with level of crime in Sheffield city centre


Conclusion

This data would allow many business to monitor crime levels in their local areas. Businesses offering security services could identify potential companies or groups of companies existing in areas of high crimes rates, allowing them to offer consolidated services e.g. security patrols.

Data could be intergrated with internal company site data and other metrics. Risk

Developments... ___