Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward online.

There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.

Here is a list of descriptions of the variables:

ID : a unique identifier for each observation
Date : the date the crime occurred
LocationDescription : the location where the crime occurred
Arrest : whether or not an arrest was made for the crime (TRUE if an arrest was made, and FALSE if an arrest was not made)
Domestic : whether or not the crime was a domestic crime, meaning that it was committed against a family member (TRUE if it was domestic, and FALSE if it was not domestic)
Beat : the area, or “beat” in which the crime occurred. This is the smallest regional division defined by the Chicago police department.
District : the police district in which the crime occured. Each district is composed of many beats, and are defined by the Chicago Police Department.
CommunityArea : the community area in which the crime occurred. Since the 1920s, Chicago has been divided into what are called “community areas”, of which there are now 77. The community areas were devised in an attempt to create socially homogeneous regions.
Year : the year in which the crime occurred.
Latitude : the latitude of the location at which the crime occurred.
Longitude : the longitude of the location at which the crime occurred.

mvt <- read.csv("mvtWeek1.csv")

Analyzing the structure and summary of data

str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

** There are 191641 obs. and 11 variables in the data set.

Looking at the summary of data

summary(mvt)
##        ID                      Date       
##  Min.   :1310022   5/16/08 0:00  :    11  
##  1st Qu.:2832144   10/17/01 22:00:    10  
##  Median :4762956   4/13/04 21:00 :    10  
##  Mean   :4968629   9/17/05 22:00 :    10  
##  3rd Qu.:7201878   10/12/01 22:00:     9  
##  Max.   :9181151   10/13/01 22:00:     9  
##                    (Other)       :191582  
##                      LocationDescription   Arrest         Domestic      
##  STREET                        :156564   Mode :logical   Mode :logical  
##  PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226   
##  OTHER                         :  4573   TRUE :15536     TRUE :415      
##  ALLEY                         :  2308   NA's :0         NA's :0        
##  GAS STATION                   :  2111                                  
##  DRIVEWAY - RESIDENTIAL        :  1675                                  
##  (Other)                       :  9558                                  
##       Beat         District     CommunityArea        Year     
##  Min.   : 111   Min.   : 1.00   Min.   : 0      Min.   :2001  
##  1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
##  Median :1121   Median :10.00   Median :32      Median :2006  
##  Mean   :1259   Mean   :11.82   Mean   :38      Mean   :2006  
##  3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
##  Max.   :2535   Max.   :31.00   Max.   :77      Max.   :2012  
##                 NA's   :43056   NA's   :24616                 
##     Latitude       Longitude     
##  Min.   :41.64   Min.   :-87.93  
##  1st Qu.:41.77   1st Qu.:-87.72  
##  Median :41.85   Median :-87.68  
##  Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :42.02   Max.   :-87.52  
##  NA's   :2276    NA's   :2276

Now we will try to answer few simple questions based on the summary of overall data.

1.) What is the maximum value of the variable “ID”?

max(mvt$ID)
## [1] 9181151

2.) What is the minimum value of the variable “Beat”?

min(mvt$Beat)
## [1] 111

3.) How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

summary(mvt$Arrest)
##    Mode   FALSE    TRUE    NA's 
## logical  176105   15536       0

** There are 15536 number of crimes for which an arrest was made.

4.) How many observations have a LocationDescription value of ALLEY?

summary(mvt$LocationDescription == "ALLEY")
##    Mode   FALSE    TRUE    NA's 
## logical  189333    2308       0

** There are 2308 observations which have a LocationDescription value of ALLEY

Working with Dates

Summarising the date variables in the data set

summary(mvt$Date)
##   5/16/08 0:00 10/17/01 22:00  4/13/04 21:00  9/17/05 22:00 10/12/01 22:00 
##             11             10             10             10              9 
## 10/13/01 22:00 10/26/01 22:00 11/17/01 19:00 11/26/03 21:00  12/2/05 21:00 
##              9              9              9              9              9 
##  3/28/02 22:00  4/29/01 22:00   4/5/02 21:00    6/8/12 9:00  7/28/01 23:00 
##              9              9              9              9              9 
##   7/5/02 22:00  8/10/04 23:00  8/30/02 23:00  9/30/10 22:00   9/9/01 22:00 
##              9              9              9              9              9 
## 10/15/01 22:00 10/22/05 19:00 10/31/07 22:00  10/5/01 22:00 11/26/04 21:00 
##              8              8              8              8              8 
##  12/1/04 22:00 12/22/05 12:00 12/29/01 19:00  12/7/05 16:00  2/16/02 21:00 
##              8              8              8              8              8 
##  3/24/05 22:00   3/9/01 20:00   4/4/11 22:00   4/7/02 22:00   5/22/10 1:00 
##              8              8              8              8              8 
##  6/19/12 22:00   6/2/11 23:00  6/27/10 22:00  6/28/01 22:00  7/30/12 22:00 
##              8              8              8              8              8 
##  7/31/08 22:00  8/10/04 20:00   8/12/06 0:00   8/18/02 1:00  8/25/02 22:00 
##              8              8              8              8              8 
##  8/28/06 21:00  9/10/12 22:00  9/17/01 22:00  9/18/01 22:00   9/22/05 0:00 
##              8              8              8              8              8 
##    9/7/07 0:00   1/1/03 21:00  1/10/07 22:00  1/14/03 12:00  1/19/02 22:00 
##              8              7              7              7              7 
##   1/2/04 20:00   1/6/01 19:00   1/6/03 18:00   1/6/11 20:00   1/6/11 21:00 
##              7              7              7              7              7 
## 10/11/01 22:00 10/12/07 13:00 10/13/01 15:00  10/13/05 0:00 10/13/06 20:00 
##              7              7              7              7              7 
## 10/15/07 22:00 10/21/01 19:00 10/21/02 19:00 10/22/03 22:00 10/22/07 20:00 
##              7              7              7              7              7 
## 10/27/10 22:00 10/28/04 22:00 10/29/02 17:00  10/3/01 18:00  10/3/01 21:00 
##              7              7              7              7              7 
## 10/30/04 22:00 11/10/03 22:00 11/13/12 21:00 11/19/02 20:00 11/24/01 17:00 
##              7              7              7              7              7 
## 11/25/03 21:00 11/28/10 22:00   11/3/02 0:00 12/12/03 23:00 12/18/12 20:00 
##              7              7              7              7              7 
##  12/22/10 8:00 12/23/05 12:00  12/7/03 22:00   2/1/03 22:00  2/17/06 22:00 
##              7              7              7              7              7 
##   2/25/02 0:00   3/1/02 21:00  3/19/01 20:00  3/21/07 20:00  3/23/12 21:00 
##              7              7              7              7              7 
##  3/28/07 22:00    3/6/01 0:00  4/11/10 22:00  4/15/01 20:00        (Other) 
##              7              7              7              7         190872

Converting the date in to specific format and summarising the Date

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))
summary(DateConvert)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" 
##         Max. 
## "2012-12-31"
median(DateConvert)
## [1] "2006-05-21"

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt.

mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)
mvt$Date = DateConvert
str(mvt)
## 'data.frame':    191641 obs. of  13 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Date, format: "2012-12-31" "2012-12-31" ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...
##  $ Month              : chr  "December" "December" "December" "December" ...
##  $ Weekday            : chr  "Monday" "Monday" "Monday" "Monday" ...

Exploring few questions around related to dates of theft.

1.) In which month did the fewest motor vehicle thefts occur?

which.min(table(mvt$Month))
## February 
##        4

2.) On which weekday did the most motor vehicle thefts occur?

which.max(table(mvt$Weekday))
## Friday 
##      1

3.) Which month has the largest number of motor vehicle thefts for which an arrest was made?

which.max(table(mvt$Arrest == "TRUE", mvt$Month)[2,])
## January 
##       5

4.) For what proportion of motor vehicle thefts in 2007 was an arrest made?

sum(mvt$Year == 2007 & mvt$Arrest == "TRUE")/ sum(mvt$Year == 2007)
## [1] 0.08487395

5.) For what proportion of motor vehicle thefts in 2012 was an arrest made?

sum(mvt$Year == 2012 & mvt$Arrest == "TRUE")/ sum(mvt$Year == 2012)
## [1] 0.03902924

Analyzing Population Location Data

Analyzing this data could be useful to the Chicago Police Department when deciding where to allocate resources. If they want to increase the number of arrests that are made for motor vehicle thefts, where should they focus their efforts?

** We want to find the top five locations where motor vehicle thefts occur.

sort(table(mvt$LocationDescription))
## 
##     AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
##                                               1 
##                  AIRPORT EXTERIOR - SECURE AREA 
##                                               1 
##                                 ANIMAL HOSPITAL 
##                                               1 
##                                 APPLIANCE STORE 
##                                               1 
##                                       CTA TRAIN 
##                                               1 
##                         JAIL / LOCK-UP FACILITY 
##                                               1 
##                                       NEWSSTAND 
##                                               1 
##                                          BRIDGE 
##                                               2 
##               COLLEGE/UNIVERSITY RESIDENCE HALL 
##                                               2 
##                               CURRENCY EXCHANGE 
##                                               2 
##                                   BOWLING ALLEY 
##                                               3 
##                                  CLEANING STORE 
##                                               3 
##                           MEDICAL/DENTAL OFFICE 
##                                               3 
##                              ABANDONED BUILDING 
##                                               4 
## AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA 
##                                               4 
##                                      BARBERSHOP 
##                                               4 
##                  LAKEFRONT/WATERFRONT/RIVERBANK 
##                                               4 
##                                         LIBRARY 
##                                               4 
##                                SAVINGS AND LOAN 
##                                               4 
##  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA 
##                                               5 
##                                   CHA APARTMENT 
##                                               5 
##                                 DAY CARE CENTER 
##                                               5 
##                                    FIRE STATION 
##                                               5 
##                                 FOREST PRESERVE 
##                                               6 
##                                            BANK 
##                                               7 
##                               CONVENIENCE STORE 
##                                               7 
##                                      DRUG STORE 
##                                               8 
##                 OTHER COMMERCIAL TRANSPORTATION 
##                                               8 
##                                   ATHLETIC CLUB 
##                                               9 
##                   AIRPORT VENDING ESTABLISHMENT 
##                                              10 
##                             AIRPORT PARKING LOT 
##                                              11 
##                       SCHOOL, PRIVATE, BUILDING 
##                                              14 
##                             TAVERN/LIQUOR STORE 
##                                              14 
##                  FACTORY/MANUFACTURING BUILDING 
##                                              16 
##                                   BAR OR TAVERN 
##                                              17 
##                                       WAREHOUSE 
##                                              17 
##                             MOVIE HOUSE/THEATER 
##                                              18 
##                         RESIDENCE PORCH/HALLWAY 
##                                              18 
##                    NURSING HOME/RETIREMENT HOME 
##                                              21 
##                                         TAXICAB 
##                                              21 
##                                DEPARTMENT STORE 
##                                              22 
##                              HIGHWAY/EXPRESSWAY 
##                                              22 
##                        SCHOOL, PRIVATE, GROUNDS 
##                                              23 
##                              VEHICLE-COMMERCIAL 
##                                              23 
##              AIRPORT EXTERIOR - NON-SECURE AREA 
##                                              24 
##               OTHER RAILROAD PROP / TRAIN DEPOT 
##                                              28 
##                              SMALL RETAIL STORE 
##                                              33 
##                               CONSTRUCTION SITE 
##                                              35 
##                                        CAR WASH 
##                                              44 
##                      COLLEGE/UNIVERSITY GROUNDS 
##                                              47 
##                    GOVERNMENT BUILDING/PROPERTY 
##                                              48 
##                                      RESTAURANT 
##                                              49 
##               CHURCH/SYNAGOGUE/PLACE OF WORSHIP 
##                                              56 
##                              GROCERY FOOD STORE 
##                                              80 
##                       HOSPITAL BUILDING/GROUNDS 
##                                             101 
##                        SCHOOL, PUBLIC, BUILDING 
##                                             114 
##                                     HOTEL/MOTEL 
##                                             124 
##                    COMMERCIAL / BUSINESS OFFICE 
##                                             126 
##                     CTA GARAGE / OTHER PROPERTY 
##                                             148 
##                            SPORTS ARENA/STADIUM 
##                                             166 
##                                       APARTMENT 
##                                             184 
##                         SCHOOL, PUBLIC, GROUNDS 
##                                             206 
##                                   PARK PROPERTY 
##                                             255 
##                 POLICE FACILITY/VEH PARKING LOT 
##                                             266 
##                                AIRPORT/AIRCRAFT 
##                                             363 
##                         CHA PARKING LOT/GROUNDS 
##                                             405 
##                                        SIDEWALK 
##                                             462 
##                          VEHICLE NON-COMMERCIAL 
##                                             817 
##                                 VACANT LOT/LAND 
##                                             985 
##                                RESIDENCE-GARAGE 
##                                            1176 
##                                       RESIDENCE 
##                                            1302 
##                   RESIDENTIAL YARD (FRONT/BACK) 
##                                            1536 
##                          DRIVEWAY - RESIDENTIAL 
##                                            1675 
##                                     GAS STATION 
##                                            2111 
##                                           ALLEY 
##                                            2308 
##                                           OTHER 
##                                            4573 
##                  PARKING LOT/GARAGE(NON.RESID.) 
##                                           14852 
##                                          STREET 
##                                          156564

These are Street, Parking Lot/Garage (Non. Resid.), Alley, Gas Station, and Driveway - Residential.

Creating a subset of data, only taking observations for which the theft happened in one of these five locations, and call this new data set “Top5”.

Top5 <- subset(mvt, mvt$LocationDescription == "STREET" | mvt$LocationDescription == "PARKING LOT/GARAGE(NON.RESID.)"
               | mvt$LocationDescription == "ALLEY" | mvt$LocationDescription == "GAS STATION"
               | mvt$LocationDescription == "DRIVEWAY - RESIDENTIAL")
summary(Top5)
##        ID               Date           
##  Min.   :1310022   Min.   :2001-01-01  
##  1st Qu.:2827268   1st Qu.:2003-07-08  
##  Median :4752514   Median :2006-05-16  
##  Mean   :4959006   Mean   :2006-08-18  
##  3rd Qu.:7184899   3rd Qu.:2009-10-15  
##  Max.   :9181151   Max.   :2012-12-31  
##                                        
##                      LocationDescription   Arrest         Domestic      
##  STREET                        :156564   Mode :logical   Mode :logical  
##  PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:163492    FALSE:177193   
##  ALLEY                         :  2308   TRUE :14018     TRUE :317      
##  GAS STATION                   :  2111   NA's :0         NA's :0        
##  DRIVEWAY - RESIDENTIAL        :  1675                                  
##  ABANDONED BUILDING            :     0                                  
##  (Other)                       :     0                                  
##       Beat         District     CommunityArea        Year     
##  Min.   : 111   Min.   : 1.00   Min.   : 0.00   Min.   :2001  
##  1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22.00   1st Qu.:2003  
##  Median :1121   Median :10.00   Median :31.00   Median :2006  
##  Mean   :1264   Mean   :11.88   Mean   :37.74   Mean   :2006  
##  3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:59.00   3rd Qu.:2009  
##  Max.   :2535   Max.   :31.00   Max.   :77.00   Max.   :2012  
##                 NA's   :39988   NA's   :22857                 
##     Latitude       Longitude         Month             Weekday         
##  Min.   :41.64   Min.   :-87.92   Length:177510      Length:177510     
##  1st Qu.:41.77   1st Qu.:-87.72   Class :character   Class :character  
##  Median :41.85   Median :-87.68   Mode  :character   Mode  :character  
##  Mean   :41.85   Mean   :-87.68                                        
##  3rd Qu.:41.92   3rd Qu.:-87.64                                        
##  Max.   :42.02   Max.   :-87.52                                        
##  NA's   :2099    NA's   :2099
str(Top5)
## 'data.frame':    177510 obs. of  13 variables:
##  $ ID                 : int  8951354 8951141 8952223 8951608 8950793 8950760 8951611 8951802 8950706 8951585 ...
##  $ Date               : Date, format: "2012-12-31" "2012-12-31" ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 72 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 724 211 2521 423 231 1021 1215 1011 ...
##  $ District           : int  6 12 7 2 25 4 2 10 12 10 ...
##  $ CommunityArea      : int  69 24 67 35 19 48 40 29 24 29 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 41.8 41.8 41.9 ...
##  $ Longitude          : num  -87.6 -87.7 -87.7 -87.6 -87.8 ...
##  $ Month              : chr  "December" "December" "December" "December" ...
##  $ Weekday            : chr  "Monday" "Monday" "Monday" "Monday" ...

** There are 177510 observation in Top5 subset Answering few questions related to location of arrest and thefts happened in Top 5 locations.

1.) One of the locations has a much higher arrest rate than the other locations. Which is it?

Top5$LocationDescription = factor(Top5$LocationDescription)
table(Top5$LocationDescription, Top5$Arrest , Top5$Arrest)
## , ,  = FALSE
## 
##                                 
##                                   FALSE   TRUE
##   ALLEY                            2059      0
##   DRIVEWAY - RESIDENTIAL           1543      0
##   GAS STATION                      1672      0
##   PARKING LOT/GARAGE(NON.RESID.)  13249      0
##   STREET                         144969      0
## 
## , ,  = TRUE
## 
##                                 
##                                   FALSE   TRUE
##   ALLEY                               0    249
##   DRIVEWAY - RESIDENTIAL              0    132
##   GAS STATION                         0    439
##   PARKING LOT/GARAGE(NON.RESID.)      0   1603
##   STREET                              0  11595

** Gas Station has by far the highest percentage of arrests, with over 20% of motor vehicle thefts resulting in an arrest.

2.) On which day of the week do the most motor vehicle thefts at gas stations happen? – Saturday

3.) On which day of the week do the fewest motor vehicle thefts in residential driveways happen? – Saturday

Histogram of Date variable

Box Plot of Date Variable

** If you look at the boxplot, the one for Arrest=TRUE is definitely skewed towards the bottom of the plot, meaning that there were more crimes for which arrests were made in the first half of the time period.