Trump referred to the violence in Chicago as “horrible carnage”
While Donald Trump’s opponents keep on demonstrating their contempt to his acts, the new president made explicit review of Chicago violence problem, which he gained from his predecessor - democrat Barack Obama.
“If they’re not going to solve the problem …. then we’re going to solve the problem for them. Because we’re going to have to do something,” Trump said of Chicago officials. “What’s happening in Chicago should not be happening in this country.”
We want to estimate criminal vilolence level in Chicago using open data that is too really big for primitive handy calculations.
We use dataset “Crimes - 2001 to present” from data.gov. This dataset reflects reported incidents of crime that occurred in the City of Chicago from 2001 to present. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
For more about Chicago criminal data see https://catalog.data.gov/dataset/crimes-2001-to-present-398a4.
chicago_crime<-readRDS(file = "chicago_crime.rds") # it takes two minutes to load compressed file in memory
dim(chicago_crime)
## [1] 6263265 22
names(chicago_crime)
## [1] "ID" "Case Number" "Date"
## [4] "Block" "IUCR" "Primary Type"
## [7] "Description" "Location Description" "Arrest"
## [10] "Domestic" "Beat" "District"
## [13] "Ward" "Community Area" "FBI Code"
## [16] "X Coordinate" "Y Coordinate" "Year"
## [19] "Updated On" "Latitude" "Longitude"
## [22] "Location"
head(chicago_crime[,c(3,6,8)])
## Date Primary Type Location Description
## 1 02/20/2011 02:05:00 PM BATTERY APARTMENT
## 2 02/20/2011 03:30:00 PM CRIMINAL DAMAGE PARKING LOT/GARAGE(NON.RESID.)
## 3 02/20/2011 07:15:00 PM NARCOTICS STREET
## 4 02/20/2011 05:10:00 PM ASSAULT SIDEWALK
## 5 02/18/2011 07:00:00 AM THEFT PARKING LOT/GARAGE(NON.RESID.)
## 6 02/20/2011 01:00:00 PM BATTERY APARTMENT
The Pareto principle (also known as the 80/20 rule, the law of the vital few, or the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes.
For more about Pareto principle see https://en.wikipedia.org/wiki/Pareto_principle.
Let’s apply this geniune universal principle for our data on Chicago’s crimes.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(qcc)
## Package 'qcc', version 2.6
## Type 'citation("qcc")' for citing this R package in publications.
crime_types <- chicago_crime %>% count(`Primary Type`) %>% arrange(desc(n))
crime_types
## # A tibble: 35 <U+00D7> 2
## `Primary Type` n
## <chr> <int>
## 1 THEFT 1301914
## 2 BATTERY 1142746
## 3 CRIMINAL DAMAGE 720406
## 4 NARCOTICS 687889
## 5 OTHER OFFENSE 387741
## 6 ASSAULT 382636
## 7 BURGLARY 365988
## 8 MOTOR VEHICLE THEFT 295474
## 9 ROBBERY 236391
## 10 DECEPTIVE PRACTICE 226195
## # ... with 25 more rows
crimes<-head(crime_types$n,20)
names(crimes) <-head(crime_types$`Primary Type`,20)
pareto.chart(x=crimes)
##
## Pareto chart analysis for crimes
## Frequency Cum.Freq. Percentage
## THEFT 1301914 1301914 20.8954836
## BATTERY 1142746 2444660 18.3408660
## CRIMINAL DAMAGE 720406 3165066 11.5623856
## NARCOTICS 687889 3852955 11.0404937
## OTHER OFFENSE 387741 4240696 6.2231727
## ASSAULT 382636 4623332 6.1412384
## BURGLARY 365988 4989320 5.8740410
## MOTOR VEHICLE THEFT 295474 5284794 4.7423041
## ROBBERY 236391 5521185 3.7940327
## DECEPTIVE PRACTICE 226195 5747380 3.6303887
## CRIMINAL TRESPASS 181059 5928439 2.9059641
## PROSTITUTION 67042 5995481 1.0760119
## WEAPONS VIOLATION 61541 6057022 0.9877219
## PUBLIC PEACE VIOLATION 45218 6102240 0.7257407
## OFFENSE INVOLVING CHILDREN 40837 6143077 0.6554264
## CRIM SEXUAL ASSAULT 23904 6166981 0.3836549
## SEX OFFENSE 22972 6189953 0.3686964
## GAMBLING 14036 6203989 0.2252753
## LIQUOR LAW VIOLATION 13629 6217618 0.2187430
## INTERFERENCE WITH PUBLIC OFFICER 12982 6230600 0.2083587
##
## Pareto chart analysis for crimes
## Cum.Percent.
## THEFT 20.89548
## BATTERY 39.23635
## CRIMINAL DAMAGE 50.79874
## NARCOTICS 61.83923
## OTHER OFFENSE 68.06240
## ASSAULT 74.20364
## BURGLARY 80.07768
## MOTOR VEHICLE THEFT 84.81999
## ROBBERY 88.61402
## DECEPTIVE PRACTICE 92.24441
## CRIMINAL TRESPASS 95.15037
## PROSTITUTION 96.22638
## WEAPONS VIOLATION 97.21410
## PUBLIC PEACE VIOLATION 97.93985
## OFFENSE INVOLVING CHILDREN 98.59527
## CRIM SEXUAL ASSAULT 98.97893
## SEX OFFENSE 99.34762
## GAMBLING 99.57290
## LIQUOR LAW VIOLATION 99.79164
## INTERFERENCE WITH PUBLIC OFFICER 100.00000
crime_locations <- chicago_crime %>% count(`Location Description`) %>% arrange(desc(n))
crime_locations
## # A tibble: 173 <U+00D7> 2
## `Location Description` n
## <chr> <int>
## 1 STREET 1663980
## 2 RESIDENCE 1059717
## 3 APARTMENT 636437
## 4 SIDEWALK 626755
## 5 OTHER 236301
## 6 PARKING LOT/GARAGE(NON.RESID.) 179408
## 7 ALLEY 141544
## 8 SCHOOL, PUBLIC, BUILDING 136155
## 9 RESIDENCE-GARAGE 123949
## 10 RESIDENCE PORCH/HALLWAY 109648
## # ... with 163 more rows
locations<-head(crime_locations$n,20)
names(locations) <-head(crime_locations$`Location Description`,20)
pareto.chart(x=locations)
##
## Pareto chart analysis for locations
## Frequency Cum.Freq. Percentage
## STREET 1663980 1663980 29.4861167
## RESIDENCE 1059717 2723697 18.7784343
## APARTMENT 636437 3360134 11.2778132
## SIDEWALK 626755 3986889 11.1062459
## OTHER 236301 4223190 4.1873093
## PARKING LOT/GARAGE(NON.RESID.) 179408 4402598 3.1791519
## ALLEY 141544 4544142 2.5081930
## SCHOOL, PUBLIC, BUILDING 136155 4680297 2.4126986
## RESIDENCE-GARAGE 123949 4804246 2.1964054
## RESIDENCE PORCH/HALLWAY 109648 4913894 1.9429883
## SMALL RETAIL STORE 106566 5020460 1.8883746
## VEHICLE NON-COMMERCIAL 99313 5119773 1.7598497
## RESTAURANT 92848 5212621 1.6452884
## GROCERY FOOD STORE 81027 5293648 1.4358175
## DEPARTMENT STORE 75425 5369073 1.3365487
## GAS STATION 65162 5434235 1.1546860
## RESIDENTIAL YARD (FRONT/BACK) 59881 5494116 1.0611054
## CHA PARKING LOT/GROUNDS 54410 5548526 0.9641580
## PARK PROPERTY 48619 5597145 0.8615401
## COMMERCIAL / BUSINESS OFFICE 46121 5643266 0.8172750
##
## Pareto chart analysis for locations
## Cum.Percent.
## STREET 29.48612
## RESIDENCE 48.26455
## APARTMENT 59.54236
## SIDEWALK 70.64861
## OTHER 74.83592
## PARKING LOT/GARAGE(NON.RESID.) 78.01507
## ALLEY 80.52326
## SCHOOL, PUBLIC, BUILDING 82.93596
## RESIDENCE-GARAGE 85.13237
## RESIDENCE PORCH/HALLWAY 87.07536
## SMALL RETAIL STORE 88.96373
## VEHICLE NON-COMMERCIAL 90.72358
## RESTAURANT 92.36887
## GROCERY FOOD STORE 93.80469
## DEPARTMENT STORE 95.14124
## GAS STATION 96.29592
## RESIDENTIAL YARD (FRONT/BACK) 97.35703
## CHA PARKING LOT/GROUNDS 98.32118
## PARK PROPERTY 99.18273
## COMMERCIAL / BUSINESS OFFICE 100.00000