The Data Source

The dataset is based on Police department incidents in the city of San Francisco for the period 01/01/2003 through 06/12/2017. Note that non-criminal incidents have been removed.

Most Prevalent Crime Categories and Incident Descriptions Across Time Periods

Overall: 2003 - 2017

Larceny/Theft is the most prevalent category while Grand Theft from Locked Auto is the most prevalent incident description. Below are the top 20 categories as well as the top 20 Incident Descriptions in the dataset.

## Source: local data frame [20 x 3]
## Groups: Category [10]
## 
##          Category                                  Descript      n
##             <chr>                                     <chr>  <int>
## 1   LARCENY/THEFT              GRAND THEFT FROM LOCKED AUTO 159505
## 2         ASSAULT                                   BATTERY  63742
## 3   VEHICLE THEFT                         STOLEN AUTOMOBILE  62457
## 4  OTHER OFFENSES     DRIVERS LICENSE, SUSPENDED OR REVOKED  61098
## 5        WARRANTS                            WARRANT ARREST  54032
## 6  SUSPICIOUS OCC                     SUSPICIOUS OCCURRENCE  50151
## 7   LARCENY/THEFT              PETTY THEFT FROM LOCKED AUTO  48354
## 8       VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES  41954
## 9   LARCENY/THEFT                   PETTY THEFT OF PROPERTY  41449
## 10      VANDALISM             MALICIOUS MISCHIEF, VANDALISM  40762
## 11 OTHER OFFENSES                         TRAFFIC VIOLATION  36657
## 12        ASSAULT                      THREATS AGAINST LIFE  33182
## 13       WARRANTS           ENROUTE TO OUTSIDE JURISDICTION  27289
## 14  LARCENY/THEFT                   GRAND THEFT OF PROPERTY  26905
## 15  LARCENY/THEFT               PETTY THEFT FROM A BUILDING  24150
## 16  LARCENY/THEFT                   PETTY THEFT SHOPLIFTING  22900
## 17 MISSING PERSON                              FOUND PERSON  22827
## 18  DRUG/NARCOTIC     POSSESSION OF NARCOTICS PARAPHERNALIA  22117
## 19  LARCENY/THEFT               GRAND THEFT FROM A BUILDING  21495
## 20          FRAUD              CREDIT CARD, THEFT BY USE OF  20932

Hourly

Larceny/Theft and Other Offenses are consistently the most prevalent crime categories throughout the day. At 1am and 2am, Assault joins the previous two categories as a top crime category.

## Selecting by Count
## Source: local data frame [20 x 3]
## Groups: Hour [10]
## 
##     Hour       Category Count
##    <dbl>          <chr> <int>
## 1      0 OTHER OFFENSES 17844
## 2      0  LARCENY/THEFT 17541
## 3      1  LARCENY/THEFT 10807
## 4      1        ASSAULT  8764
## 5      2        ASSAULT  7955
## 6      2  LARCENY/THEFT  7120
## 7      3 OTHER OFFENSES  4624
## 8      3  LARCENY/THEFT  4440
## 9      4 OTHER OFFENSES  3472
## 10     4  LARCENY/THEFT  2872
## 11     5  LARCENY/THEFT  2873
## 12     5 OTHER OFFENSES  2741
## 13     6  LARCENY/THEFT  4558
## 14     6 OTHER OFFENSES  3984
## 15     7 OTHER OFFENSES  7918
## 16     7  LARCENY/THEFT  7222
## 17     8 OTHER OFFENSES 12317
## 18     8  LARCENY/THEFT 12159
## 19     9  LARCENY/THEFT 14360
## 20     9 OTHER OFFENSES 13891

Grand Theft from Locked Auto is consistently among the most prominent incident descriptions throughout the day. Other common ones are Battery, Driver’s License - Suspended or Revoked, Warrant Arrest, and Stolen Vehicle.
Driver’s License - Suspended or Revoked is prevalent during the early hours of the morning. Stolen Vehicle is common during the late afternoon.

## Selecting by Count
## Source: local data frame [20 x 3]
## Groups: Hour [10]
## 
##     Hour                              Descript Count
##    <dbl>                                 <chr> <int>
## 1      0          GRAND THEFT FROM LOCKED AUTO  6496
## 2      0 DRIVERS LICENSE, SUSPENDED OR REVOKED  3659
## 3      1          GRAND THEFT FROM LOCKED AUTO  3873
## 4      1                               BATTERY  3216
## 5      2          GRAND THEFT FROM LOCKED AUTO  2778
## 6      2                               BATTERY  2761
## 7      3          GRAND THEFT FROM LOCKED AUTO  1810
## 8      3                               BATTERY  1163
## 9      4          GRAND THEFT FROM LOCKED AUTO  1105
## 10     4 DRIVERS LICENSE, SUSPENDED OR REVOKED   831
## 11     5          GRAND THEFT FROM LOCKED AUTO  1048
## 12     5         MALICIOUS MISCHIEF, VANDALISM   561
## 13     6          GRAND THEFT FROM LOCKED AUTO  1635
## 14     6                        WARRANT ARREST   909
## 15     7          GRAND THEFT FROM LOCKED AUTO  2386
## 16     7                        WARRANT ARREST  1734
## 17     8          GRAND THEFT FROM LOCKED AUTO  3545
## 18     8                 SUSPICIOUS OCCURRENCE  2447
## 19     9          GRAND THEFT FROM LOCKED AUTO  4289
## 20     9                 SUSPICIOUS OCCURRENCE  2754

Days of the Week

When we switch our focus to crime during the week, we see that LARCENY/THEFT and OTHER OFFENSES are again the most common crime categories.

Grand Theft from Locked Auto is consistenly the most common incident description every day of the week, followed by Stolen Automobile and Battery.

## Selecting by Count
## Source: local data frame [21 x 3]
## Groups: DayOfWeek [7]
## 
##    DayOfWeek                              Descript Count
##        <chr>                                 <chr> <int>
## 1     Friday          GRAND THEFT FROM LOCKED AUTO 24450
## 2     Friday                     STOLEN AUTOMOBILE 10008
## 3     Friday                               BATTERY  9484
## 4     Monday          GRAND THEFT FROM LOCKED AUTO 21251
## 5     Monday DRIVERS LICENSE, SUSPENDED OR REVOKED  8416
## 6     Monday                     STOLEN AUTOMOBILE  8377
## 7   Saturday          GRAND THEFT FROM LOCKED AUTO 25028
## 8   Saturday                               BATTERY 10231
## 9   Saturday                     STOLEN AUTOMOBILE  9550
## 10    Sunday          GRAND THEFT FROM LOCKED AUTO 22858
## # ... with 11 more rows

Police Districts with Highest Crime

The police districts with highest crime incidents are Southern and Mission. Below we include the Crime Category and Incident Description* distributions for each of the two districts.

Distributions of Crime Categories and Incident Descriptions in SOUTHERN

Distributions of Crime Categories and Incident Descriptions in MISSION

Seasonality for Top Categories and Incident Descriptions

Categories - Hourly

We see that Larceny/Theft and Vehicle Theft peak at noon and later at 6pm.All categories dip after 10pm and begin picking up in the morning.

Categories - Days of the Week

Larceny/Theft increases during Friday and Saturday, and it hits a low on Sunday and Monday. Assault remains steady during the week and hits a high all throughout the weekend. Drug/Narcotic increases in Wednesdays.

Categories - Monthly

Larceny/Theft is hits highs during the warm months (July through October) and decreases during the winter. Assault remains steady most of the year. Dips during Novermber and December (holidays).

Descriptions - Hourly

We see that Grand Theft from Locked Auto peaks at noon and hits a high (twice its level at noon) later at 7pm. All descriptions dip after 10pm and begin picking up in the morning.

Descriptions - Days of the Week

Grand Theft from Locked Auto increases during the weekend. Driver’s License, Suspended or Revoked is highest on Wednesdays.

Descriptions - Monthly

All Descriptions except for Grand Theft from Locked Auto seem to slightly decrease towards the end of the year. Although Grand Theft from Locked Auto fluctuates throughout the year, it is consistently the most common incident description. Overall, the remaining descriptions do not experience much variation throughout the year.

Heatmaps of Most Prevalent Incident Descriptions (2016)

We’ve included a few heatmaps relating to some of the top Incident Descriptions. The heatmaps tell us that while the core of criminal activity concentrates in one part of town, the area which appears red in all of the maps below, but they also tell us other insights.

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=san+francisco&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=san%20francisco&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

Predicting Crime Location

The challenge: Calssifying a two-dimensional variable

Approach 1: Building two separate models, one to predict Longitude, one to predict Latitude. The downside: The model is able to predict the obvious, that the core of the data points are in San Francisco downtown (the red area in the heatmaps above).

Approach 2: Predicting addresses, for addresses which have at least 100 historical reports.Because the model takes Incident Description as one of the predictor variables, we need to build a separate model for each of the categories. Running the classification algorithms is extremely computationally expensive.