Project Proposal

Chicago Police Department(CPD) regularly publishes reported incidents of crime that has occured in the city of Chicago. The data is based upon preliminary investigation conducted by CPD. Data analysis will be done on the data set to understand trends in criminal activity occuring in city.

Data Source

Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. The data set has been accessed through https://data.cityofchicago.org/Public-Safety/Crimes-2015/vwwp-7yr9/about link. Although CPD publishes data for all the years but due to sheer volume of data, I am considering data for only 2015 year.

Packages Required

Package(s) used in assignment to execute R code are mentioned below:

library(dplyr)
library(tidyverse)
library(RSocrata)

Data Import

url <- "https://data.cityofchicago.org/Public-Safety/Crimes-2015/vwwp-7yr9"
crime_data_original <- read.socrata(url)
dim(crime_data_original)
## [1] 262874     22
names(crime_data_original)
##  [1] "ID"                   "Case.Number"          "Date"                
##  [4] "Block"                "IUCR"                 "Primary.Type"        
##  [7] "Description"          "Location.Description" "Arrest"              
## [10] "Domestic"             "Beat"                 "District"            
## [13] "Ward"                 "Community.Area"       "FBI.Code"            
## [16] "X.Coordinate"         "Y.Coordinate"         "Year"                
## [19] "Updated.On"           "Latitude"             "Longitude"           
## [22] "Location"
crime_data <- crime_data_original %>% select(Case.Number,Date,
                                    Primary.Type,Description,Location.Description,Arrest,
                                    Domestic,District,Year,Latitude,Longitude)

CrimeDate <- as.POSIXct(crime_data$Date)
crime_data <- cbind(select(crime_data,-c(Date)),CrimeDate)
crime_date <- as_tibble(crime_data)

Data Description

In this section, a brief description of the dataset has been provided. Out of 22 variables in original data set, 11 relevant variables are considered.

Variable Name Description
Case.Number Unique number of a criminal case
Date Data on which crime was committed
Primary.Type Type of crime committed
Description Detailed description of crime
Location.Description Broad category of location where crime happened
Arrest Whether arrest in case made or not
Domestic Whether crime was a domestic crime
District District number where incident happened
Year Only year 2015 is considered
Latitude Latitude of exact location
Longitude Longitude of exact location
#Number of rows and variables
dim(crime_data)
## [1] 262874     11
#Names of variables
names(crime_data)
##  [1] "Case.Number"          "Primary.Type"         "Description"         
##  [4] "Location.Description" "Arrest"               "Domestic"            
##  [7] "District"             "Year"                 "Latitude"            
## [10] "Longitude"            "CrimeDate"
#Checking top and bottom values
head(crime_data)
##   Case.Number    Primary.Type                   Description
## 1    HZ100370 CRIMINAL DAMAGE                    TO VEHICLE
## 2    HZ199559           THEFT                 FROM BUILDING
## 3    HZ100006         BATTERY AGGRAVATED: OTHER DANG WEAPON
## 4    HZ100002         BATTERY                        SIMPLE
## 5    HZ100010           THEFT                $500 AND UNDER
## 6    HZ100487           THEFT                $500 AND UNDER
##      Location.Description Arrest Domestic District Year Latitude Longitude
## 1                  STREET  false    false        6 2015 41.75737 -87.64299
## 2 RESIDENCE PORCH/HALLWAY  false    false       14 2015 41.90947 -87.70700
## 3                  STREET  false    false        4 2015 41.75127 -87.58582
## 4                SIDEWALK   true    false       19 2015 41.94984 -87.65864
## 5               APARTMENT  false    false       24 2015 42.01680 -87.69071
## 6                  STREET  false    false        1 2015 41.88817 -87.62294
##             CrimeDate
## 1 2015-12-31 23:59:00
## 2 2015-12-31 23:59:00
## 3 2015-12-31 23:55:00
## 4 2015-12-31 23:50:00
## 5 2015-12-31 23:50:00
## 6 2015-12-31 23:45:00
tail(crime_data)
##        Case.Number               Primary.Type
## 262869    HZ355526 OFFENSE INVOLVING CHILDREN
## 262870    HY381224 OFFENSE INVOLVING CHILDREN
## 262871    HZ386573                SEX OFFENSE
## 262872    HZ454893                SEX OFFENSE
## 262873    HZ490274         DECEPTIVE PRACTICE
## 262874    HZ507492        CRIM SEXUAL ASSAULT
##                           Description           Location.Description
## 262869 AGG SEX ASSLT OF CHILD FAM MBR                      RESIDENCE
## 262870 AGG SEX ASSLT OF CHILD FAM MBR                      RESIDENCE
## 262871 SEXUAL EXPLOITATION OF A CHILD                      RESIDENCE
## 262872      AGG CRIMINAL SEXUAL ABUSE                      RESIDENCE
## 262873                   EMBEZZLEMENT FACTORY/MANUFACTURING BUILDING
## 262874                      PREDATORY                      RESIDENCE
##        Arrest Domestic District Year Latitude Longitude  CrimeDate
## 262869  false     true        9 2015       NA        NA 2015-01-01
## 262870  false    false       16 2015       NA        NA 2015-01-01
## 262871  false    false       11 2015       NA        NA 2015-01-01
## 262872  false    false        5 2015       NA        NA 2015-01-01
## 262873  false    false       10 2015       NA        NA 2015-01-01
## 262874  false     true       14 2015       NA        NA 2015-01-01
summary(crime_data)
##    Case.Number              Primary.Type  
##  HY346207:     4   THEFT          :57285  
##  HY442430:     3   BATTERY        :48893  
##  HY259141:     3   CRIMINAL DAMAGE:28669  
##  HY551250:     2   NARCOTICS      :23824  
##  HY442160:     2   OTHER OFFENSE  :17526  
##  HY406845:     2   ASSAULT        :17041  
##  (Other) :262858   (Other)        :69636  
##                   Description                         Location.Description
##  SIMPLE                 : 27392   STREET                        :60749    
##  $500 AND UNDER         : 24657   RESIDENCE                     :40980    
##  DOMESTIC BATTERY SIMPLE: 24604   APARTMENT                     :34713    
##  TO VEHICLE             : 14281   SIDEWALK                      :27865    
##  TO PROPERTY            : 13296   OTHER                         :10503    
##  OVER $500              : 12689   PARKING LOT/GARAGE(NON.RESID.): 7434    
##  (Other)                :145955   (Other)                       :80630    
##    Arrest        Domestic         District          Year     
##  false:193557   false:221210   Min.   : 1.00   Min.   :2015  
##  true : 69317   true : 41664   1st Qu.: 6.00   1st Qu.:2015  
##                                Median :10.00   Median :2015  
##                                Mean   :11.21   Mean   :2015  
##                                3rd Qu.:16.00   3rd Qu.:2015  
##                                Max.   :31.00   Max.   :2015  
##                                                              
##     Latitude       Longitude        CrimeDate                  
##  Min.   :41.64   Min.   :-87.93   Min.   :2015-01-01 00:00:00  
##  1st Qu.:41.77   1st Qu.:-87.72   1st Qu.:2015-04-11 10:16:15  
##  Median :41.86   Median :-87.67   Median :2015-07-06 21:00:00  
##  Mean   :41.84   Mean   :-87.67   Mean   :2015-07-05 06:46:42  
##  3rd Qu.:41.90   3rd Qu.:-87.63   3rd Qu.:2015-09-29 01:41:45  
##  Max.   :42.02   Max.   :-87.52   Max.   :2015-12-31 23:59:00  
##  NA's   :3061    NA's   :3061     NA's   :8

Data Cleaning

#Counting missing values
sum(is.na(crime_data))
## [1] 6130
# See what all rows have incomplete data
crime_data[!complete.cases(crime_data$CrimeDate),]
##        Case.Number Primary.Type             Description
## 221286    HY177945        THEFT           FROM BUILDING
## 221287    HY175936      BATTERY DOMESTIC BATTERY SIMPLE
## 221288    HY176002      BATTERY                  SIMPLE
## 221289    HY175964      BATTERY DOMESTIC BATTERY SIMPLE
## 221290    HY176191        THEFT           FROM BUILDING
## 221291    HY176588        THEFT          $500 AND UNDER
## 221292    HY177362        THEFT          $500 AND UNDER
## 221293    HY179605        THEFT           FROM BUILDING
##                Location.Description Arrest Domestic District Year Latitude
## 221286                   RESTAURANT  false    false       18 2015 41.89001
## 221287                    APARTMENT  false     true        3 2015 41.75967
## 221288                       STREET  false    false        6 2015 41.75556
## 221289                    APARTMENT   true     true       11 2015 41.90163
## 221290                BAR OR TAVERN  false    false       12 2015 41.90317
## 221291                        OTHER  false    false       12 2015 41.88539
## 221292                        OTHER  false    false        5 2015 41.67750
## 221293 COMMERCIAL / BUSINESS OFFICE  false    false        1 2015 41.85102
##        Longitude CrimeDate
## 221286 -87.63166      <NA>
## 221287 -87.57922      <NA>
## 221288 -87.65023      <NA>
## 221289 -87.71896      <NA>
## 221290 -87.67844      <NA>
## 221291 -87.66355      <NA>
## 221292 -87.65655      <NA>
## 221293 -87.61895      <NA>
#Removing 8 rows with NAs in Crimedata column only
crime_data<- crime_data[-which(is.na(crime_data$CrimeDate)), ]

Planned Analysis

Upon completion, this project will help us answer the following questions: 1. What are the major category of crimes happening most frequently 2. Which spots criminals usually prefer so that CPD can target those specific areas for surveillance 3. Number of arrests made based on preliminary investigation 4. How much domestic crime constitute in reported total number of crime incidents 5. Which areas(districts) are most affected by criminal actvities. Latitude/Longitude may be used apart from district level data 6. Other questions that may arise during analysis