Abstract

A surface analysis of the San Fransisco crime statistics in the relation of the geographical location, How does a crime category vary by neighborhood? Which of the categories are most common in the city center? In what areas or neighborhoods are robberies or thefts most common? This analysis is part of the course Communicating Data Science Results given by Coursera in co-operation with the university of Washington.

Data

The raw data is publically available at the course github repository

The data is collected from Summer 2014 and has 13 variables which are listed here below (Figure 1a). Each crime has been labeled with their distinctive category class. There is 34 different categories in the dataset.

Variable Description
IncidntNum crime Id
Time time of the crime incident
Date date of the crime incident
Category category of the crime incident
Descript detailed description of the crime incident
DayOfWeek the day of the week
PdDistrict name of the Police Department District
Resolution how the crime incident was resolved
Address the approximate street address of the crime incident
X longitude
Y latitude
Location (longitude,latitude)
PdId police Department District Id

Figure 1a.

Data pre-processing

In the data cleaning process, most of the column are dropped, what remains are the Category , X and Y, these are sufficient to plot the crimes on the map of the San Francisco.

This analysis is mostly focusing to the high occuring crime incidents to the San Francisco. Below histogram lists the frequency of all the crime category classes. The analysis that is to follow, mainly focuses to the crime categories with occurency greater or equal 100 incidents, this type of crime is from this point forwards referenced as high-crime.

XYscatterplot

Figure 2a.

Analysis:

For reference the map of San Francisco neighborhood(s). This map was copied from the public resource. And can be found here. The later analysis will reference the distric boundaries from this map.

sfArea
Figure 3a.

The maps below (Figure 3b) has the the high-crime indicents plotted a based on their X and Y coordinates. Each high-crime category is draw in to respectable map, producing 16s maps in total. The result indicates that there is a substantial between the type of crime and their georaphical location.

Figure 3b.

Below the theft (including vehicle theft) and robbery, draw in their of maps. The heat map overlay, further narrates the gradual accumulation of the crime incidents when approach in the northeast of the San Fransisco.

Figure 3c.

In the terms of rare crime (incidents below 100 in the Summer of 2004), no clear patter seem to emerge by the occurences are somewhat randomly distributed amoung the city. However a case could be made, that the east side faces more crime incidents than the west even in the rare-crime category.

Figure 3d.

Conclusions

There are apparent hotspots of crime on particular geographic areas at San Francisco, when crime data is examined by nature of the reported activity. The figure 3b clearly suggest that while vechile theft, suspicious activity incidents are distributed among the whole city of San Francisco, they are more frequent at district of Tenderloin. This suggestions is futher enforced by the figure 3c. More serious crime, such as assaults, drug violations, robberies and warrants are also more concentrated around Tenderloin district. Theft also seems to be more request at the Mission district. The rare crime incident don’t seem to have and apparent geographical preference as indicated by Figure 3d. This however is probably due to the lack of statistical data, in all fortune.