Chicago Police Department(CPD) regularly publishes reported incidents of crime that has occured in the city of Chicago. The data is based upon preliminary investigation conducted by CPD. Data analysis will be done on the data set to understand trends in criminal activity occuring in city.
Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. The data set has been accessed through https://data.cityofchicago.org/Public-Safety/Crimes-2015/vwwp-7yr9/about link. Although CPD publishes data for all the years but due to sheer volume of data, I am considering data for only 2015 year.
Package(s) used in assignment to execute R code are mentioned below:
library(dplyr)
library(tidyverse)
library(RSocrata)
url <- "https://data.cityofchicago.org/Public-Safety/Crimes-2015/vwwp-7yr9"
crime_data_original <- read.socrata(url)
dim(crime_data_original)
## [1] 262874 22
names(crime_data_original)
## [1] "ID" "Case.Number" "Date"
## [4] "Block" "IUCR" "Primary.Type"
## [7] "Description" "Location.Description" "Arrest"
## [10] "Domestic" "Beat" "District"
## [13] "Ward" "Community.Area" "FBI.Code"
## [16] "X.Coordinate" "Y.Coordinate" "Year"
## [19] "Updated.On" "Latitude" "Longitude"
## [22] "Location"
crime_data <- crime_data_original %>% select(Case.Number,Date,
Primary.Type,Description,Location.Description,Arrest,
Domestic,District,Year,Latitude,Longitude)
CrimeDate <- as.POSIXct(crime_data$Date)
crime_data <- cbind(select(crime_data,-c(Date)),CrimeDate)
crime_date <- as_tibble(crime_data)
In this section, a brief description of the dataset has been provided. Out of 22 variables in original data set, 11 relevant variables are considered.
| Variable Name | Description |
|---|---|
| Case.Number | Unique number of a criminal case |
| Date | Data on which crime was committed |
| Primary.Type | Type of crime committed |
| Description | Detailed description of crime |
| Location.Description | Broad category of location where crime happened |
| Arrest | Whether arrest in case made or not |
| Domestic | Whether crime was a domestic crime |
| District | District number where incident happened |
| Year | Only year 2015 is considered |
| Latitude | Latitude of exact location |
| Longitude | Longitude of exact location |
#Number of rows and variables
dim(crime_data)
## [1] 262874 11
#Names of variables
names(crime_data)
## [1] "Case.Number" "Primary.Type" "Description"
## [4] "Location.Description" "Arrest" "Domestic"
## [7] "District" "Year" "Latitude"
## [10] "Longitude" "CrimeDate"
#Checking top and bottom values
head(crime_data)
## Case.Number Primary.Type Description
## 1 HZ100370 CRIMINAL DAMAGE TO VEHICLE
## 2 HZ199559 THEFT FROM BUILDING
## 3 HZ100006 BATTERY AGGRAVATED: OTHER DANG WEAPON
## 4 HZ100002 BATTERY SIMPLE
## 5 HZ100010 THEFT $500 AND UNDER
## 6 HZ100487 THEFT $500 AND UNDER
## Location.Description Arrest Domestic District Year Latitude Longitude
## 1 STREET false false 6 2015 41.75737 -87.64299
## 2 RESIDENCE PORCH/HALLWAY false false 14 2015 41.90947 -87.70700
## 3 STREET false false 4 2015 41.75127 -87.58582
## 4 SIDEWALK true false 19 2015 41.94984 -87.65864
## 5 APARTMENT false false 24 2015 42.01680 -87.69071
## 6 STREET false false 1 2015 41.88817 -87.62294
## CrimeDate
## 1 2015-12-31 23:59:00
## 2 2015-12-31 23:59:00
## 3 2015-12-31 23:55:00
## 4 2015-12-31 23:50:00
## 5 2015-12-31 23:50:00
## 6 2015-12-31 23:45:00
tail(crime_data)
## Case.Number Primary.Type
## 262869 HZ355526 OFFENSE INVOLVING CHILDREN
## 262870 HY381224 OFFENSE INVOLVING CHILDREN
## 262871 HZ386573 SEX OFFENSE
## 262872 HZ454893 SEX OFFENSE
## 262873 HZ490274 DECEPTIVE PRACTICE
## 262874 HZ507492 CRIM SEXUAL ASSAULT
## Description Location.Description
## 262869 AGG SEX ASSLT OF CHILD FAM MBR RESIDENCE
## 262870 AGG SEX ASSLT OF CHILD FAM MBR RESIDENCE
## 262871 SEXUAL EXPLOITATION OF A CHILD RESIDENCE
## 262872 AGG CRIMINAL SEXUAL ABUSE RESIDENCE
## 262873 EMBEZZLEMENT FACTORY/MANUFACTURING BUILDING
## 262874 PREDATORY RESIDENCE
## Arrest Domestic District Year Latitude Longitude CrimeDate
## 262869 false true 9 2015 NA NA 2015-01-01
## 262870 false false 16 2015 NA NA 2015-01-01
## 262871 false false 11 2015 NA NA 2015-01-01
## 262872 false false 5 2015 NA NA 2015-01-01
## 262873 false false 10 2015 NA NA 2015-01-01
## 262874 false true 14 2015 NA NA 2015-01-01
summary(crime_data)
## Case.Number Primary.Type
## HY346207: 4 THEFT :57285
## HY442430: 3 BATTERY :48893
## HY259141: 3 CRIMINAL DAMAGE:28669
## HY551250: 2 NARCOTICS :23824
## HY442160: 2 OTHER OFFENSE :17526
## HY406845: 2 ASSAULT :17041
## (Other) :262858 (Other) :69636
## Description Location.Description
## SIMPLE : 27392 STREET :60749
## $500 AND UNDER : 24657 RESIDENCE :40980
## DOMESTIC BATTERY SIMPLE: 24604 APARTMENT :34713
## TO VEHICLE : 14281 SIDEWALK :27865
## TO PROPERTY : 13296 OTHER :10503
## OVER $500 : 12689 PARKING LOT/GARAGE(NON.RESID.): 7434
## (Other) :145955 (Other) :80630
## Arrest Domestic District Year
## false:193557 false:221210 Min. : 1.00 Min. :2015
## true : 69317 true : 41664 1st Qu.: 6.00 1st Qu.:2015
## Median :10.00 Median :2015
## Mean :11.21 Mean :2015
## 3rd Qu.:16.00 3rd Qu.:2015
## Max. :31.00 Max. :2015
##
## Latitude Longitude CrimeDate
## Min. :41.64 Min. :-87.93 Min. :2015-01-01 00:00:00
## 1st Qu.:41.77 1st Qu.:-87.72 1st Qu.:2015-04-11 10:16:15
## Median :41.86 Median :-87.67 Median :2015-07-06 21:00:00
## Mean :41.84 Mean :-87.67 Mean :2015-07-05 06:46:42
## 3rd Qu.:41.90 3rd Qu.:-87.63 3rd Qu.:2015-09-29 01:41:45
## Max. :42.02 Max. :-87.52 Max. :2015-12-31 23:59:00
## NA's :3061 NA's :3061 NA's :8
#Counting missing values
sum(is.na(crime_data))
## [1] 6130
# See what all rows have incomplete data
crime_data[!complete.cases(crime_data$CrimeDate),]
## Case.Number Primary.Type Description
## 221286 HY177945 THEFT FROM BUILDING
## 221287 HY175936 BATTERY DOMESTIC BATTERY SIMPLE
## 221288 HY176002 BATTERY SIMPLE
## 221289 HY175964 BATTERY DOMESTIC BATTERY SIMPLE
## 221290 HY176191 THEFT FROM BUILDING
## 221291 HY176588 THEFT $500 AND UNDER
## 221292 HY177362 THEFT $500 AND UNDER
## 221293 HY179605 THEFT FROM BUILDING
## Location.Description Arrest Domestic District Year Latitude
## 221286 RESTAURANT false false 18 2015 41.89001
## 221287 APARTMENT false true 3 2015 41.75967
## 221288 STREET false false 6 2015 41.75556
## 221289 APARTMENT true true 11 2015 41.90163
## 221290 BAR OR TAVERN false false 12 2015 41.90317
## 221291 OTHER false false 12 2015 41.88539
## 221292 OTHER false false 5 2015 41.67750
## 221293 COMMERCIAL / BUSINESS OFFICE false false 1 2015 41.85102
## Longitude CrimeDate
## 221286 -87.63166 <NA>
## 221287 -87.57922 <NA>
## 221288 -87.65023 <NA>
## 221289 -87.71896 <NA>
## 221290 -87.67844 <NA>
## 221291 -87.66355 <NA>
## 221292 -87.65655 <NA>
## 221293 -87.61895 <NA>
#Removing 8 rows with NAs in Crimedata column only
crime_data<- crime_data[-which(is.na(crime_data$CrimeDate)), ]
Upon completion, this project will help us answer the following questions: 1. What are the major category of crimes happening most frequently 2. Which spots criminals usually prefer so that CPD can target those specific areas for surveillance 3. Number of arrests made based on preliminary investigation 4. How much domestic crime constitute in reported total number of crime incidents 5. Which areas(districts) are most affected by criminal actvities. Latitude/Longitude may be used apart from district level data 6. Other questions that may arise during analysis