Chicago Crime Data

Introduction

The following data was downloaded from https://catalog.data.gov/dataset/crimes-2001-to-present-398a4. This dataset contains every crime that was committed in the city of Chicago from 2001 to the present. It contains the location of the crime, the type of crime, the date it was reported, and other descriptive information.

crime<-read.csv("C:/Users/ianh1/Downloads/Crimes2001topresent.csv")
str(crime)
## 'data.frame':    6753677 obs. of  22 variables:
##  $ ID                  : int  10000092 10000094 10000095 10000096 10000097 10000098 10000099 10000100 10000101 10000104 ...
##  $ Case.Number         : Factor w/ 6753272 levels "",".JB299184",..: 5757556 5757692 5757686 5757688 5757632 5757677 5757684 5757644 5757669 5757625 ...
##  $ Date                : Factor w/ 2702731 levels "01/01/2001 01:00:00 AM",..: 538445 538538 538530 538520 538478 538510 538538 538497 538512 538490 ...
##  $ Block               : Factor w/ 59915 levels "0000X E 100 PL",..: 36753 47516 34786 38952 36667 37660 49045 33643 29883 55887 ...
##  $ IUCR                : Factor w/ 401 levels "0110","0141",..: 31 360 53 41 19 41 53 53 258 41 ...
##  $ Primary.Type        : Factor w/ 35 levels "ARSON","ASSAULT",..: 3 26 3 3 31 3 3 3 19 3 ...
##  $ Description         : Factor w/ 379 levels "$300 AND UNDER",..: 41 258 138 316 54 316 138 138 274 316 ...
##  $ Location.Description: Factor w/ 180 levels "","ABANDONED BUILDING",..: 159 159 19 19 155 19 19 19 159 141 ...
##  $ Arrest              : Factor w/ 2 levels "false","true": 1 2 1 1 1 1 1 1 2 1 ...
##  $ Domestic            : Factor w/ 2 levels "false","true": 1 1 2 1 1 1 2 2 1 1 ...
##  $ Beat                : int  1111 725 222 225 1113 223 733 213 912 511 ...
##  $ District            : int  11 7 2 2 11 2 7 2 9 5 ...
##  $ Ward                : int  28 15 4 3 28 4 17 3 11 6 ...
##  $ Community.Area      : int  25 67 39 40 25 39 68 38 59 49 ...
##  $ FBI.Code            : Factor w/ 26 levels "01A","01B","02",..: 6 26 11 11 4 11 11 11 21 11 ...
##  $ X.Coordinate        : int  1144606 1166468 1185075 1178033 1144920 1183018 1170859 1178746 1164279 1179637 ...
##  $ Y.Coordinate        : int  1903566 1860715 1875622 1870804 1898709 1872537 1858210 1876914 1880656 1840444 ...
##  $ Year                : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ Updated.On          : Factor w/ 2471 levels "01/01/2007 07:32:02 AM",..: 233 233 233 233 233 233 233 233 233 233 ...
##  $ Latitude            : num  41.9 41.8 41.8 41.8 41.9 ...
##  $ Longitude           : num  -87.7 -87.7 -87.6 -87.6 -87.7 ...
##  $ Location            : Factor w/ 860499 levels "","(36.619446395, -91.686565684)",..: 528673 232195 357606 325048 482419 337080 210407 364570 379567 79045 ...

Due to the incredibly large number of observations (6,753,677), a subset of the data will be taken to only show crimes commited it 2017.

crime2017<-crime[crime$Year==2017,]

Now that the data set is more managable, the location and type of crimes will be plotted using the leaflet package to give an idea of where the crimes are being reported. ##Map

library(leaflet)
mymap <- crime2017 %>%
     leaflet() %>%
     addTiles() %>%
     addMarkers(popup=crime2017$Primary.Type , clusterOptions = markerClusterOptions())
mymap

Now looking at the crimes by type in a pie chart can give an idea of how often certain crimes occur and how frequently arrests are made regard the crimes.

suppressPackageStartupMessages(library(sqldf))
suppressPackageStartupMessages(library(plotly))
piedata<-sqldf("select *, round((cast(arrestcount as float(3))/crimecount) * 100,2) as arrestpercentage from (select *, count(*) as crimecount, sum(case when Arrest='true' then 1 else 0 end) as arrestcount from crime2017 group by [Primary.Type])")
p <- plot_ly(piedata, labels = ~Primary.Type, values = ~crimecount, type = 'pie',
              textposition = 'inside',
              textinfo = 'label+percent',
              insidetextfont = list(color = '#FFFFFF'),
              hoverinfo = 'text',
              text = ~paste('Crime: ',Primary.Type,'\n','Number of crimes reported: ', crimecount,'\nArrested', arrestpercentage,'%'),
              marker = list(colors = colors,
                            line = list(color = '#FFFFFF', width = 1)),
              #The 'pull' attribute can also be used to create space between the sectors
              showlegend = FALSE) %>%
     layout(title = 'Crimes Reported',
            xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
            yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

Pie Chart of Crimes

p

Lastly, a logistic regression will be used to predict if an arrest is made, given the type of crime, district, latitude, and longitude. The reference group for the type of crime is theft.

crimeLogit<-glm(Arrest~Primary.Type + as.factor(District) + Latitude + Longitude,data = crime2017,family = "binomial")

The stargazer package outputs HTML code that makes tables for regression models.

suppressPackageStartupMessages(library(stargazer))
stargazer(crimeLogit,type="html",covariate.labels = c("Crime Type: Arson", "Crime Type: Assault","Crime Type: Battery","Crime Type: Burglary","Crime Type: Concealed Carry License Violation","Crime Type: Criminal Sexual Assualt","Crime Type: Criminal Damage","Crime Type: Criminal Trespass","Crime Type: Deceptive Practice","Crime Type: Gambling","Crime Type: Homocide","Crime Type: Human Trafficking","Crime Type: Interference With Public Officer","Crime Type: Intimidation","Crime Type: Kidnapping","Crime Type: Liquor Law Violation","Crime Type: Motor Vehicle Theft","Crime Type: Narcotics","Crime Type: Non-Criminal","Crime Type: Non-Criminal (Subject Specified)","Crime Type: Obscenity","Crime Type: Offense Involving Children","Crime Type: Other Narctoic Violation","Crime Type: Other Offense","Crime Type: Prositiution","Crime Type: Public Indecency","Crime Type: Public Peace Violation","Crime Type: Robbery","Crime Type: Sex Offense","Crime Type: Stalking","Crime Type: Weapons Violation","District: 2","District: 3","District: 4","District: 5","District: 6","District: 7","District: 8","District: 9","District: 10","District: 11","District: 12","District: 14","District: 15","District: 16","District: 17","District: 18","District: 19","District: 20","District: 22","District: 24","District: 25","District: 31"))

Logistic Regression

Dependent variable:
Arrest
Crime Type: Arson -0.150
(0.168)
Crime Type: Assault 0.670***
(0.023)
Crime Type: Battery 0.834***
(0.018)
Crime Type: Burglary -0.796***
(0.044)
Crime Type: Concealed Carry License Violation 5.359***
(0.591)
Crime Type: Criminal Sexual Assualt -0.569***
(0.108)
Crime Type: Criminal Damage -0.541***
(0.028)
Crime Type: Criminal Trespass 2.563***
(0.028)
Crime Type: Deceptive Practice -0.882***
(0.039)
Crime Type: Gambling 16.766
(63.792)
Crime Type: Homocide 1.011***
(0.092)
Crime Type: Human Trafficking 0.253
(1.071)
Crime Type: Interference With Public Officer 5.124***
(0.140)
Crime Type: Intimidation -0.074
(0.281)
Crime Type: Kidnapping -0.198
(0.262)
Crime Type: Liquor Law Violation 16.789
(63.726)
Crime Type: Motor Vehicle Theft -0.223***
(0.037)
Crime Type: Narcotics 9.938***
(0.448)
Crime Type: Non-Criminal -1.378
(1.014)
Crime Type: Non-Criminal (Subject Specified) 2.063
(1.419)
Crime Type: Obscenity 3.700***
(0.287)
Crime Type: Offense Involving Children 0.451***
(0.062)
Crime Type: Other Narctoic Violation 2.695***
(0.629)
Crime Type: Other Offense 0.931***
(0.023)
Crime Type: Prositiution 16.733
(32.545)
Crime Type: Public Indecency 16.730
(278.435)
Crime Type: Public Peace Violation 2.951***
(0.057)
Crime Type: Robbery -0.290***
(0.037)
Crime Type: Sex Offense 0.764***
(0.084)
Crime Type: Stalking 0.266
(0.220)
Crime Type: Weapons Violation 3.470***
(0.039)
District: 2 -0.439***
(0.048)
District: 3 -0.280***
(0.058)
District: 4 -0.224***
(0.072)
District: 5 -0.017
(0.091)
District: 6 -0.086
(0.067)
District: 7 -0.222***
(0.059)
District: 8 -0.391***
(0.066)
District: 9 -0.202***
(0.048)
District: 10 -0.126***
(0.048)
District: 11 -0.241***
(0.046)
District: 12 -0.477***
(0.041)
District: 14 -0.424***
(0.050)
District: 15 -0.387***
(0.059)
District: 16 -0.315***
(0.077)
District: 17 -0.476***
(0.065)
District: 18 0.015
(0.037)
District: 19 -0.218***
(0.051)
District: 20 -0.071
(0.069)
District: 22 -0.437***
(0.088)
District: 24 -0.370***
(0.074)
District: 25 -0.198***
(0.056)
District: 31 -12.488
(604.573)
Latitude -0.538
(0.448)
Longitude -0.148
(0.329)
Constant 7.570
(31.620)
Observations 264,996
Log Likelihood -92,757.770
Akaike Inf. Crit. 185,627.500
Note: p<0.1; p<0.05; p<0.01