In the previous topics on crimes in Chicago we have explored dataset from gov.data and applied Pareto principle to select the most relevant types and locations of crimes in Chicago for estimation the difference in types of crimes and locations versus day of the week and month. See https://rpubs.com/alex-lev/248923, https://rpubs.com/alex-lev/249124, https://rpubs.com/alex-lev/249354.
Now we want to estimate criminal vilolence level in Chicago by comparing statistics for types of crimes to the crime locations.
For more about Chicago criminal data see https://catalog.data.gov/dataset/crimes-2001-to-present-398a4.
chicago_crime<-readRDS(file = "chicago_crime.rds") # it takes two minutes to load compressed file in memory
dim(chicago_crime)
## [1] 6263265 22
names(chicago_crime)
## [1] "ID" "Case Number" "Date"
## [4] "Block" "IUCR" "Primary Type"
## [7] "Description" "Location Description" "Arrest"
## [10] "Domestic" "Beat" "District"
## [13] "Ward" "Community Area" "FBI Code"
## [16] "X Coordinate" "Y Coordinate" "Year"
## [19] "Updated On" "Latitude" "Longitude"
## [22] "Location"
The main locations of crime in Chicago are STREET, RESIDENCE, APARTMENT, SIDEWALK, OTHER, PARKING LOT/GARAGE(NON.RESID.) and ALLEY - 80% of all crimes. The main types of crime in Chicago are THEFT, BATTERY, CRIMINAL DAMAGE, NARCOTICS, OTHER OFFENSE, ASSAULT and BURGLARY - 80% of all crimes. Now we’ll try to estimate difference in number of crimes by types versus locations using contigency tables. See https://en.wikipedia.org/wiki/Contingency_table.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
loc<-c("STREET","RESIDENCE","APARTMENT","SIDEWALK","OTHER","PARKING LOT/GARAGE(NON.RESID.)","ALLEY")
cr_prm_loc<-filter(chicago_crime,`Location Description`%in% loc)
typ<-c("THEFT", "BATTERY", "CRIMINAL DAMAGE", "NARCOTICS", "OTHER OFFENSE", "ASSAULT", "BURGLARY")
cr_prm_loc_typ<-filter(cr_prm_loc,`Primary Type` %in% typ)
cr_prm_loc_typ_tbl<-cr_prm_loc_typ %>% group_by('Primary Type',`Location Description`)
cr_prm_loc_typ_tbl_2<-table(cr_prm_loc_typ_tbl[,c(6,8)])
cr_prm_loc_typ_tbl_2
## Location Description
## Primary Type ALLEY APARTMENT OTHER PARKING LOT/GARAGE(NON.RESID.)
## ASSAULT 8686 46079 13421 6475
## BATTERY 25382 221942 25022 14866
## BURGLARY 407 104444 13247 1051
## CRIMINAL DAMAGE 12692 72566 27340 30556
## NARCOTICS 43523 11204 11822 12687
## OTHER OFFENSE 2409 58776 15978 2830
## THEFT 12765 61442 75426 66868
## Location Description
## Primary Type RESIDENCE SIDEWALK STREET
## ASSAULT 66423 52037 80088
## BATTERY 239291 170265 206407
## BURGLARY 125842 153 938
## CRIMINAL DAMAGE 136681 9457 260773
## NARCOTICS 26082 216276 246248
## OTHER OFFENSE 181200 10112 64519
## THEFT 139723 41009 360741
We use Chi-squared test for contingency tables which we have generated above. See https://en.wikipedia.org/wiki/Chi-squared_test
#street
ch.fit.loc.typ<-chisq.test(cr_prm_loc_typ_tbl_2)
ch.fit.loc.typ
##
## Pearson's Chi-squared test
##
## data: cr_prm_loc_typ_tbl_2
## X-squared = 1395600, df = 36, p-value < 2.2e-16
Here we produce charts for demonstating differences between locations of crimes using the results by Chi-squared test. Let’s see!
library(corrplot)
# Visualize the contribution of day of the week and month
#street
contrib.loc.typ <- 100*ch.fit.loc.typ$residuals^2/ch.fit.loc.typ$statistic
round(contrib.loc.typ, 3)
## Location Description
## Primary Type ALLEY APARTMENT OTHER PARKING LOT/GARAGE(NON.RESID.)
## ASSAULT 0.005 0.012 0.000 0.096
## BATTERY 0.002 3.096 0.650 0.750
## BURGLARY 0.457 7.854 0.005 0.515
## CRIMINAL DAMAGE 0.050 0.177 0.000 0.355
## NARCOTICS 3.154 4.948 0.698 0.242
## OTHER OFFENSE 0.398 0.041 0.003 0.536
## THEFT 0.282 2.060 2.639 3.791
## Location Description
## Primary Type RESIDENCE SIDEWALK STREET
## ASSAULT 0.006 0.401 0.105
## BATTERY 0.044 1.231 2.211
## BURGLARY 4.716 2.401 5.784
## CRIMINAL DAMAGE 0.002 4.145 2.251
## NARCOTICS 6.850 17.556 1.165
## OTHER OFFENSE 7.910 2.016 1.476
## THEFT 0.983 2.742 3.186
corrplot(contrib.loc.typ, is.cor = FALSE)