Trump referred to the violence in Chicago as horrible carnage

Trump referred to the violence in Chicago as “horrible carnage”

What’s wrong with Chicago?

While Donald Trump’s opponents keep on demonstrating their contempt to his acts, the new president made explicit review of Chicago violence problem, which he gained from his predecessor - democrat Barack Obama.

“If they’re not going to solve the problem …. then we’re going to solve the problem for them. Because we’re going to have to do something,” Trump said of Chicago officials. “What’s happening in Chicago should not be happening in this country.”

Research goal

We want to estimate criminal vilolence level in Chicago using open data that is too really big for primitive handy calculations.

Data

We use dataset “Crimes - 2001 to present” from data.gov. This dataset reflects reported incidents of crime that occurred in the City of Chicago from 2001 to present. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system.

For more about Chicago criminal data see https://catalog.data.gov/dataset/crimes-2001-to-present-398a4.

chicago_crime<-readRDS(file = "chicago_crime.rds") # it takes two minutes to load compressed file in memory
dim(chicago_crime)
## [1] 6263265      22
names(chicago_crime)
##  [1] "ID"                   "Case Number"          "Date"                
##  [4] "Block"                "IUCR"                 "Primary Type"        
##  [7] "Description"          "Location Description" "Arrest"              
## [10] "Domestic"             "Beat"                 "District"            
## [13] "Ward"                 "Community Area"       "FBI Code"            
## [16] "X Coordinate"         "Y Coordinate"         "Year"                
## [19] "Updated On"           "Latitude"             "Longitude"           
## [22] "Location"
head(chicago_crime[,c(3,6,8)])
##                     Date    Primary Type           Location Description
## 1 02/20/2011 02:05:00 PM         BATTERY                      APARTMENT
## 2 02/20/2011 03:30:00 PM CRIMINAL DAMAGE PARKING LOT/GARAGE(NON.RESID.)
## 3 02/20/2011 07:15:00 PM       NARCOTICS                         STREET
## 4 02/20/2011 05:10:00 PM         ASSAULT                       SIDEWALK
## 5 02/18/2011 07:00:00 AM           THEFT PARKING LOT/GARAGE(NON.RESID.)
## 6 02/20/2011 01:00:00 PM         BATTERY                      APARTMENT

Pareto principle

The Pareto principle (also known as the 80/20 rule, the law of the vital few, or the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes.

For more about Pareto principle see https://en.wikipedia.org/wiki/Pareto_principle.

Let’s apply this geniune universal principle for our data on Chicago’s crimes.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(qcc)
## Package 'qcc', version 2.6
## Type 'citation("qcc")' for citing this R package in publications.
crime_types <- chicago_crime %>% count(`Primary Type`) %>% arrange(desc(n))
crime_types
## # A tibble: 35 <U+00D7> 2
##         `Primary Type`       n
##                  <chr>   <int>
## 1                THEFT 1301914
## 2              BATTERY 1142746
## 3      CRIMINAL DAMAGE  720406
## 4            NARCOTICS  687889
## 5        OTHER OFFENSE  387741
## 6              ASSAULT  382636
## 7             BURGLARY  365988
## 8  MOTOR VEHICLE THEFT  295474
## 9              ROBBERY  236391
## 10  DECEPTIVE PRACTICE  226195
## # ... with 25 more rows
crimes<-head(crime_types$n,20)
names(crimes) <-head(crime_types$`Primary Type`,20)
pareto.chart(x=crimes)

##                                   
## Pareto chart analysis for crimes
##                                    Frequency Cum.Freq. Percentage
##   THEFT                              1301914   1301914 20.8954836
##   BATTERY                            1142746   2444660 18.3408660
##   CRIMINAL DAMAGE                     720406   3165066 11.5623856
##   NARCOTICS                           687889   3852955 11.0404937
##   OTHER OFFENSE                       387741   4240696  6.2231727
##   ASSAULT                             382636   4623332  6.1412384
##   BURGLARY                            365988   4989320  5.8740410
##   MOTOR VEHICLE THEFT                 295474   5284794  4.7423041
##   ROBBERY                             236391   5521185  3.7940327
##   DECEPTIVE PRACTICE                  226195   5747380  3.6303887
##   CRIMINAL TRESPASS                   181059   5928439  2.9059641
##   PROSTITUTION                         67042   5995481  1.0760119
##   WEAPONS VIOLATION                    61541   6057022  0.9877219
##   PUBLIC PEACE VIOLATION               45218   6102240  0.7257407
##   OFFENSE INVOLVING CHILDREN           40837   6143077  0.6554264
##   CRIM SEXUAL ASSAULT                  23904   6166981  0.3836549
##   SEX OFFENSE                          22972   6189953  0.3686964
##   GAMBLING                             14036   6203989  0.2252753
##   LIQUOR LAW VIOLATION                 13629   6217618  0.2187430
##   INTERFERENCE WITH PUBLIC OFFICER     12982   6230600  0.2083587
##                                   
## Pareto chart analysis for crimes
##                                    Cum.Percent.
##   THEFT                                20.89548
##   BATTERY                              39.23635
##   CRIMINAL DAMAGE                      50.79874
##   NARCOTICS                            61.83923
##   OTHER OFFENSE                        68.06240
##   ASSAULT                              74.20364
##   BURGLARY                             80.07768
##   MOTOR VEHICLE THEFT                  84.81999
##   ROBBERY                              88.61402
##   DECEPTIVE PRACTICE                   92.24441
##   CRIMINAL TRESPASS                    95.15037
##   PROSTITUTION                         96.22638
##   WEAPONS VIOLATION                    97.21410
##   PUBLIC PEACE VIOLATION               97.93985
##   OFFENSE INVOLVING CHILDREN           98.59527
##   CRIM SEXUAL ASSAULT                  98.97893
##   SEX OFFENSE                          99.34762
##   GAMBLING                             99.57290
##   LIQUOR LAW VIOLATION                 99.79164
##   INTERFERENCE WITH PUBLIC OFFICER    100.00000
crime_locations <- chicago_crime %>% count(`Location Description`) %>% arrange(desc(n))
crime_locations
## # A tibble: 173 <U+00D7> 2
##            `Location Description`       n
##                             <chr>   <int>
## 1                          STREET 1663980
## 2                       RESIDENCE 1059717
## 3                       APARTMENT  636437
## 4                        SIDEWALK  626755
## 5                           OTHER  236301
## 6  PARKING LOT/GARAGE(NON.RESID.)  179408
## 7                           ALLEY  141544
## 8        SCHOOL, PUBLIC, BUILDING  136155
## 9                RESIDENCE-GARAGE  123949
## 10        RESIDENCE PORCH/HALLWAY  109648
## # ... with 163 more rows
locations<-head(crime_locations$n,20)
names(locations) <-head(crime_locations$`Location Description`,20)
pareto.chart(x=locations)

##                                 
## Pareto chart analysis for locations
##                                  Frequency Cum.Freq. Percentage
##   STREET                           1663980   1663980 29.4861167
##   RESIDENCE                        1059717   2723697 18.7784343
##   APARTMENT                         636437   3360134 11.2778132
##   SIDEWALK                          626755   3986889 11.1062459
##   OTHER                             236301   4223190  4.1873093
##   PARKING LOT/GARAGE(NON.RESID.)    179408   4402598  3.1791519
##   ALLEY                             141544   4544142  2.5081930
##   SCHOOL, PUBLIC, BUILDING          136155   4680297  2.4126986
##   RESIDENCE-GARAGE                  123949   4804246  2.1964054
##   RESIDENCE PORCH/HALLWAY           109648   4913894  1.9429883
##   SMALL RETAIL STORE                106566   5020460  1.8883746
##   VEHICLE NON-COMMERCIAL             99313   5119773  1.7598497
##   RESTAURANT                         92848   5212621  1.6452884
##   GROCERY FOOD STORE                 81027   5293648  1.4358175
##   DEPARTMENT STORE                   75425   5369073  1.3365487
##   GAS STATION                        65162   5434235  1.1546860
##   RESIDENTIAL YARD (FRONT/BACK)      59881   5494116  1.0611054
##   CHA PARKING LOT/GROUNDS            54410   5548526  0.9641580
##   PARK PROPERTY                      48619   5597145  0.8615401
##   COMMERCIAL / BUSINESS OFFICE       46121   5643266  0.8172750
##                                 
## Pareto chart analysis for locations
##                                  Cum.Percent.
##   STREET                             29.48612
##   RESIDENCE                          48.26455
##   APARTMENT                          59.54236
##   SIDEWALK                           70.64861
##   OTHER                              74.83592
##   PARKING LOT/GARAGE(NON.RESID.)     78.01507
##   ALLEY                              80.52326
##   SCHOOL, PUBLIC, BUILDING           82.93596
##   RESIDENCE-GARAGE                   85.13237
##   RESIDENCE PORCH/HALLWAY            87.07536
##   SMALL RETAIL STORE                 88.96373
##   VEHICLE NON-COMMERCIAL             90.72358
##   RESTAURANT                         92.36887
##   GROCERY FOOD STORE                 93.80469
##   DEPARTMENT STORE                   95.14124
##   GAS STATION                        96.29592
##   RESIDENTIAL YARD (FRONT/BACK)      97.35703
##   CHA PARKING LOT/GROUNDS            98.32118
##   PARK PROPERTY                      99.18273
##   COMMERCIAL / BUSINESS OFFICE      100.00000

Conclusions

  1. Chicago criminal dataset is really a big data case - 6263265 records with 22 variables that comprise 1472721220 bytes of Excel file.
  2. Applying Pareto principle to the data gives interesting results.
  3. The main types of crime in Chicago are THEFT, BATTERY, CRIMINAL DAMAGE, NARCOTICS, OTHER OFFENSE, ASSAULT and BURGLARY - 80% of all crimes.
  4. The main locations of crimes in Chicago are STREET, RESIDENCE, APARTMENT, SIDEWALK, OTHER, PARKING LOT/GARAGE, ALLEY - 80% of all locations for crimes.