Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicag, Illinois releases crime data from 2001 onward online. Chicago is the third most populous city in the United States, with a population of over 2.7 million people.
There are two main types of crimes: violent crimes, and property crimes. In this proble, we'll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we'll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.
ID: a unique identifier for each observation
Date: the date the crime occurred
LocationDescription: the location where the crime occurred
Arrest: whether or not an arrest was made for the crime (TRUE if an arrest was made, and FALSE if an arrest was not made)
Domestic: whether or not the crime was a domestic crime, meaning that it was committed against a family member (TRUE if it was domestic, and FALSE if it was not domestic)
Beat: the area, or "beat" in which the crime occurred. This is the smallest regional division defined by the Chicago police department.
District: the police district in which the crime occured. Each district is composed of many beats, and are defined by the Chicago Police Department.
CommunityArea: the community area in which the crime occurred. Since the 1920s, Chicago has been divided into what are called "community areas", of which there are now 77. The community areas were devised in an attempt to create socially homogeneous regions.
Year: the year in which the crime occurred.
Latitude: the latitude of the location at which the crime occurred.
Longitude: the longitude of the location at which the crime occurred.
# Set the directory at where the data is located
setwd("/home/tarek/Analytics/Week1/Rlectures/Data")
# Read the Data
mvt <- read.csv("mvt.csv")
# Structure of data
str(mvt)
## 'data.frame': 191641 obs. of 11 variables:
## $ ID : int 8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
## $ Date : Factor w/ 131680 levels "10/10/01 0:00",..: 37302 37300 37300 37300 37299 37297 37296 37295 37291 37290 ...
## $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
## $ Arrest : logi FALSE FALSE FALSE FALSE FALSE TRUE ...
## $ Domestic : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Beat : int 623 1213 1622 724 211 2521 423 231 1021 1215 ...
## $ District : int 6 12 16 7 2 25 4 2 10 12 ...
## $ CommunityArea : int 69 24 11 67 35 19 48 40 29 24 ...
## $ Year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ Latitude : num 41.8 41.9 42 41.8 41.8 ...
## $ Longitude : num -87.6 -87.7 -87.8 -87.7 -87.6 ...
# number of crimes for which an arrest was made
sum(mvt$Arrest == "TRUE")
## [1] 15536
# Convert date characters into a Date object in R
DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))
summary(DateConvert)
## Min. 1st Qu. Median Mean 3rd Qu.
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24"
## Max.
## "2012-12-31"
# extract the month and the day of the week, and add these variables to
# our data frame mvt
mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)
# Replace the old Date variable with DateConvert
mvt$Date = DateConvert
# month that had the fewest motor vehicle thefts
which.min(table(mvt$Month))
## February
## 4
# weekday that had the most motor vehicle thefts
which.max(table(mvt$Weekday))
## Friday
## 1
# month that has the largest number of motor vehicle thefts for which an
# arrest was made
which.max(table(mvt$Arrest == "TRUE", mvt$Month)[2, ])
## January
## 5
table(mvt$Month)
##
## April August December February January July June
## 15280 16572 16426 13511 16047 16801 16002
## March May November October September
## 15758 16035 16063 17086 16060
# Proportion of motor vehicle thefts in 2001 in which an arrest was made
sum(mvt$Year == 2001 & mvt$Arrest == "TRUE")/sum(mvt$Year == 2001)
## [1] 0.1041
# Arrests made in 2007 and 2012
sum(mvt$Year == 2007 & mvt$Arrest == "TRUE")/sum(mvt$Year == 2007)
## [1] 0.08487
sum(mvt$Year == 2012 & mvt$Arrest == "TRUE")/sum(mvt$Year == 2012)
## [1] 0.03903
Analyzing this data could be useful to the Chicago Police Department when deciding where to allocate resources. If they want to increase the number of arrests that are made for motor vehicle thefts, where should they focus their efforts?
sort(table(mvt$LocationDescription))[73:78]
##
## DRIVEWAY - RESIDENTIAL GAS STATION
## 1675 2111
## ALLEY OTHER
## 2308 4573
## PARKING LOT/GARAGE(NON.RESID.) STREET
## 14852 156564
# Distribution of crime. Study the trend to understand how crime changed
# over time
hist(mvt$Date, breaks = 100, ylab = "Monthly Number of Motor Vehicle Theft")
# Distribution of crime. Study the trend to understand how crime changed
# over time
boxplot(mvt$Date ~ mvt$Arrest)