The dataset is based on Police department incidents in the city of San Francisco for the period 01/01/2003 through 06/12/2017. Note that non-criminal incidents have been removed.
Larceny/Theft is the most prevalent category while Grand Theft from Locked Auto is the most prevalent incident description. Below are the top 20 categories as well as the top 20 Incident Descriptions in the dataset.
## Source: local data frame [20 x 3]
## Groups: Category [10]
##
## Category Descript n
## <chr> <chr> <int>
## 1 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO 159505
## 2 ASSAULT BATTERY 63742
## 3 VEHICLE THEFT STOLEN AUTOMOBILE 62457
## 4 OTHER OFFENSES DRIVERS LICENSE, SUSPENDED OR REVOKED 61098
## 5 WARRANTS WARRANT ARREST 54032
## 6 SUSPICIOUS OCC SUSPICIOUS OCCURRENCE 50151
## 7 LARCENY/THEFT PETTY THEFT FROM LOCKED AUTO 48354
## 8 VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES 41954
## 9 LARCENY/THEFT PETTY THEFT OF PROPERTY 41449
## 10 VANDALISM MALICIOUS MISCHIEF, VANDALISM 40762
## 11 OTHER OFFENSES TRAFFIC VIOLATION 36657
## 12 ASSAULT THREATS AGAINST LIFE 33182
## 13 WARRANTS ENROUTE TO OUTSIDE JURISDICTION 27289
## 14 LARCENY/THEFT GRAND THEFT OF PROPERTY 26905
## 15 LARCENY/THEFT PETTY THEFT FROM A BUILDING 24150
## 16 LARCENY/THEFT PETTY THEFT SHOPLIFTING 22900
## 17 MISSING PERSON FOUND PERSON 22827
## 18 DRUG/NARCOTIC POSSESSION OF NARCOTICS PARAPHERNALIA 22117
## 19 LARCENY/THEFT GRAND THEFT FROM A BUILDING 21495
## 20 FRAUD CREDIT CARD, THEFT BY USE OF 20932
Larceny/Theft and Other Offenses are consistently the most prevalent crime categories throughout the day. At 1am and 2am, Assault joins the previous two categories as a top crime category.
## Selecting by Count
## Source: local data frame [20 x 3]
## Groups: Hour [10]
##
## Hour Category Count
## <dbl> <chr> <int>
## 1 0 OTHER OFFENSES 17844
## 2 0 LARCENY/THEFT 17541
## 3 1 LARCENY/THEFT 10807
## 4 1 ASSAULT 8764
## 5 2 ASSAULT 7955
## 6 2 LARCENY/THEFT 7120
## 7 3 OTHER OFFENSES 4624
## 8 3 LARCENY/THEFT 4440
## 9 4 OTHER OFFENSES 3472
## 10 4 LARCENY/THEFT 2872
## 11 5 LARCENY/THEFT 2873
## 12 5 OTHER OFFENSES 2741
## 13 6 LARCENY/THEFT 4558
## 14 6 OTHER OFFENSES 3984
## 15 7 OTHER OFFENSES 7918
## 16 7 LARCENY/THEFT 7222
## 17 8 OTHER OFFENSES 12317
## 18 8 LARCENY/THEFT 12159
## 19 9 LARCENY/THEFT 14360
## 20 9 OTHER OFFENSES 13891
Grand Theft from Locked Auto is consistently among the most prominent incident descriptions throughout the day. Other common ones are Battery, Driver’s License - Suspended or Revoked, Warrant Arrest, and Stolen Vehicle.
Driver’s License - Suspended or Revoked is prevalent during the early hours of the morning. Stolen Vehicle is common during the late afternoon.
## Selecting by Count
## Source: local data frame [20 x 3]
## Groups: Hour [10]
##
## Hour Descript Count
## <dbl> <chr> <int>
## 1 0 GRAND THEFT FROM LOCKED AUTO 6496
## 2 0 DRIVERS LICENSE, SUSPENDED OR REVOKED 3659
## 3 1 GRAND THEFT FROM LOCKED AUTO 3873
## 4 1 BATTERY 3216
## 5 2 GRAND THEFT FROM LOCKED AUTO 2778
## 6 2 BATTERY 2761
## 7 3 GRAND THEFT FROM LOCKED AUTO 1810
## 8 3 BATTERY 1163
## 9 4 GRAND THEFT FROM LOCKED AUTO 1105
## 10 4 DRIVERS LICENSE, SUSPENDED OR REVOKED 831
## 11 5 GRAND THEFT FROM LOCKED AUTO 1048
## 12 5 MALICIOUS MISCHIEF, VANDALISM 561
## 13 6 GRAND THEFT FROM LOCKED AUTO 1635
## 14 6 WARRANT ARREST 909
## 15 7 GRAND THEFT FROM LOCKED AUTO 2386
## 16 7 WARRANT ARREST 1734
## 17 8 GRAND THEFT FROM LOCKED AUTO 3545
## 18 8 SUSPICIOUS OCCURRENCE 2447
## 19 9 GRAND THEFT FROM LOCKED AUTO 4289
## 20 9 SUSPICIOUS OCCURRENCE 2754
When we switch our focus to crime during the week, we see that LARCENY/THEFT and OTHER OFFENSES are again the most common crime categories.
Grand Theft from Locked Auto is consistenly the most common incident description every day of the week, followed by Stolen Automobile and Battery.
## Selecting by Count
## Source: local data frame [21 x 3]
## Groups: DayOfWeek [7]
##
## DayOfWeek Descript Count
## <chr> <chr> <int>
## 1 Friday GRAND THEFT FROM LOCKED AUTO 24450
## 2 Friday STOLEN AUTOMOBILE 10008
## 3 Friday BATTERY 9484
## 4 Monday GRAND THEFT FROM LOCKED AUTO 21251
## 5 Monday DRIVERS LICENSE, SUSPENDED OR REVOKED 8416
## 6 Monday STOLEN AUTOMOBILE 8377
## 7 Saturday GRAND THEFT FROM LOCKED AUTO 25028
## 8 Saturday BATTERY 10231
## 9 Saturday STOLEN AUTOMOBILE 9550
## 10 Sunday GRAND THEFT FROM LOCKED AUTO 22858
## # ... with 11 more rows
The police districts with highest crime incidents are Southern and Mission. Below we include the Crime Category and Incident Description* distributions for each of the two districts.
We see that Larceny/Theft and Vehicle Theft peak at noon and later at 6pm.All categories dip after 10pm and begin picking up in the morning.
Larceny/Theft increases during Friday and Saturday, and it hits a low on Sunday and Monday. Assault remains steady during the week and hits a high all throughout the weekend. Drug/Narcotic increases in Wednesdays.
Larceny/Theft is hits highs during the warm months (July through October) and decreases during the winter. Assault remains steady most of the year. Dips during Novermber and December (holidays).
We see that Grand Theft from Locked Auto peaks at noon and hits a high (twice its level at noon) later at 7pm. All descriptions dip after 10pm and begin picking up in the morning.
Grand Theft from Locked Auto increases during the weekend. Driver’s License, Suspended or Revoked is highest on Wednesdays.
All Descriptions except for Grand Theft from Locked Auto seem to slightly decrease towards the end of the year. Although Grand Theft from Locked Auto fluctuates throughout the year, it is consistently the most common incident description. Overall, the remaining descriptions do not experience much variation throughout the year.
We’ve included a few heatmaps relating to some of the top Incident Descriptions. The heatmaps tell us that while the core of criminal activity concentrates in one part of town, the area which appears red in all of the maps below, but they also tell us other insights.
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
The challenge: Calssifying a two-dimensional variable
Approach 1: Building two separate models, one to predict Longitude, one to predict Latitude. The downside: The model is able to predict the obvious, that the core of the data points are in San Francisco downtown (the red area in the heatmaps above).
Approach 2: Predicting addresses, for addresses which have at least 100 historical reports.Because the model takes Incident Description as one of the predictor variables, we need to build a separate model for each of the categories. Running the classification algorithms is extremely computationally expensive.