Introduction

In the previous topics on crimes in Chicago we have explored dataset from gov.data and applied Pareto principle to select the most relevant types and locations of crimes in Chicago for estimation the difference in types of crimes and locations versus day of the week and month. See https://rpubs.com/alex-lev/248923, https://rpubs.com/alex-lev/249124, https://rpubs.com/alex-lev/249354.

Research goal

Now we want to estimate criminal vilolence level in Chicago by comparing statistics for types of crimes to the crime locations.

Data

For more about Chicago criminal data see https://catalog.data.gov/dataset/crimes-2001-to-present-398a4.

chicago_crime<-readRDS(file = "chicago_crime.rds") # it takes two minutes to load compressed file in memory

dim(chicago_crime)
## [1] 6263265      22
names(chicago_crime)
##  [1] "ID"                   "Case Number"          "Date"                
##  [4] "Block"                "IUCR"                 "Primary Type"        
##  [7] "Description"          "Location Description" "Arrest"              
## [10] "Domestic"             "Beat"                 "District"            
## [13] "Ward"                 "Community Area"       "FBI Code"            
## [16] "X Coordinate"         "Y Coordinate"         "Year"                
## [19] "Updated On"           "Latitude"             "Longitude"           
## [22] "Location"

Contigency tables

The main locations of crime in Chicago are STREET, RESIDENCE, APARTMENT, SIDEWALK, OTHER, PARKING LOT/GARAGE(NON.RESID.) and ALLEY - 80% of all crimes. The main types of crime in Chicago are THEFT, BATTERY, CRIMINAL DAMAGE, NARCOTICS, OTHER OFFENSE, ASSAULT and BURGLARY - 80% of all crimes. Now we’ll try to estimate difference in number of crimes by types versus locations using contigency tables. See https://en.wikipedia.org/wiki/Contingency_table.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
loc<-c("STREET","RESIDENCE","APARTMENT","SIDEWALK","OTHER","PARKING LOT/GARAGE(NON.RESID.)","ALLEY")
cr_prm_loc<-filter(chicago_crime,`Location Description`%in% loc)

typ<-c("THEFT", "BATTERY", "CRIMINAL DAMAGE", "NARCOTICS", "OTHER OFFENSE", "ASSAULT", "BURGLARY")
cr_prm_loc_typ<-filter(cr_prm_loc,`Primary Type` %in% typ)

cr_prm_loc_typ_tbl<-cr_prm_loc_typ %>% group_by('Primary Type',`Location Description`)

cr_prm_loc_typ_tbl_2<-table(cr_prm_loc_typ_tbl[,c(6,8)])

cr_prm_loc_typ_tbl_2
##                  Location Description
## Primary Type       ALLEY APARTMENT  OTHER PARKING LOT/GARAGE(NON.RESID.)
##   ASSAULT           8686     46079  13421                           6475
##   BATTERY          25382    221942  25022                          14866
##   BURGLARY           407    104444  13247                           1051
##   CRIMINAL DAMAGE  12692     72566  27340                          30556
##   NARCOTICS        43523     11204  11822                          12687
##   OTHER OFFENSE     2409     58776  15978                           2830
##   THEFT            12765     61442  75426                          66868
##                  Location Description
## Primary Type      RESIDENCE SIDEWALK STREET
##   ASSAULT             66423    52037  80088
##   BATTERY            239291   170265 206407
##   BURGLARY           125842      153    938
##   CRIMINAL DAMAGE    136681     9457 260773
##   NARCOTICS           26082   216276 246248
##   OTHER OFFENSE      181200    10112  64519
##   THEFT              139723    41009 360741

Chi-squared test

We use Chi-squared test for contingency tables which we have generated above. See https://en.wikipedia.org/wiki/Chi-squared_test

#street
ch.fit.loc.typ<-chisq.test(cr_prm_loc_typ_tbl_2)
ch.fit.loc.typ
## 
##  Pearson's Chi-squared test
## 
## data:  cr_prm_loc_typ_tbl_2
## X-squared = 1395600, df = 36, p-value < 2.2e-16

Graphical interpretation

Here we produce charts for demonstating differences between locations of crimes using the results by Chi-squared test. Let’s see!

library(corrplot)
# Visualize the contribution of day of the week and month 

#street
contrib.loc.typ <- 100*ch.fit.loc.typ$residuals^2/ch.fit.loc.typ$statistic
round(contrib.loc.typ, 3)
##                  Location Description
## Primary Type       ALLEY APARTMENT  OTHER PARKING LOT/GARAGE(NON.RESID.)
##   ASSAULT          0.005     0.012  0.000                          0.096
##   BATTERY          0.002     3.096  0.650                          0.750
##   BURGLARY         0.457     7.854  0.005                          0.515
##   CRIMINAL DAMAGE  0.050     0.177  0.000                          0.355
##   NARCOTICS        3.154     4.948  0.698                          0.242
##   OTHER OFFENSE    0.398     0.041  0.003                          0.536
##   THEFT            0.282     2.060  2.639                          3.791
##                  Location Description
## Primary Type      RESIDENCE SIDEWALK STREET
##   ASSAULT             0.006    0.401  0.105
##   BATTERY             0.044    1.231  2.211
##   BURGLARY            4.716    2.401  5.784
##   CRIMINAL DAMAGE     0.002    4.145  2.251
##   NARCOTICS           6.850   17.556  1.165
##   OTHER OFFENSE       7.910    2.016  1.476
##   THEFT               0.983    2.742  3.186
corrplot(contrib.loc.typ, is.cor = FALSE)

Conclusions

  1. NARCOTICS on SIDEWALK is the great problem for Chicago violence.
  2. BURGLARY in APARTMENT, RESIDENCE and STREET is the problem of Chicago too.
  3. OTHER OFFENSE together with NARCOTICS in RESIDENCE may be coincided with NARCOTICS as well as CRIMINAL DAMAGE on SIDEWALK.
  4. THEFT in the PARKING LOT/GARAGE(NON.RESID.) and STREET is of less importance for Chicago police.