Chicago is the third most populous city in the United States, with a population of over 2.7 million people. We’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car.

context in dataset mvtWeek1.csv

mvt = read.csv("mvtWeek1.csv")
str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...
summary(mvt)
##        ID                      Date       
##  Min.   :1310022   5/16/08 0:00  :    11  
##  1st Qu.:2832144   10/17/01 22:00:    10  
##  Median :4762956   4/13/04 21:00 :    10  
##  Mean   :4968629   9/17/05 22:00 :    10  
##  3rd Qu.:7201878   10/12/01 22:00:     9  
##  Max.   :9181151   10/13/01 22:00:     9  
##                    (Other)       :191582  
##                      LocationDescription   Arrest         Domestic      
##  STREET                        :156564   Mode :logical   Mode :logical  
##  PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226   
##  OTHER                         :  4573   TRUE :15536     TRUE :415      
##  ALLEY                         :  2308   NA's :0         NA's :0        
##  GAS STATION                   :  2111                                  
##  DRIVEWAY - RESIDENTIAL        :  1675                                  
##  (Other)                       :  9558                                  
##       Beat         District     CommunityArea        Year     
##  Min.   : 111   Min.   : 1.00   Min.   : 0      Min.   :2001  
##  1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
##  Median :1121   Median :10.00   Median :32      Median :2006  
##  Mean   :1259   Mean   :11.82   Mean   :38      Mean   :2006  
##  3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
##  Max.   :2535   Max.   :31.00   Max.   :77      Max.   :2012  
##                 NA's   :43056   NA's   :24616                 
##     Latitude       Longitude     
##  Min.   :41.64   Min.   :-87.93  
##  1st Qu.:41.77   1st Qu.:-87.72  
##  Median :41.85   Median :-87.68  
##  Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :42.02   Max.   :-87.52  
##  NA's   :2276    NA's   :2276

How many observations have value TRUE in the Arrest variable

table(mvt$Arrest)
## 
##  FALSE   TRUE 
## 176105  15536

Check the date type: Month/Day/Year Hour:Minute

mvt$Date[1]
## [1] 12/31/12 23:15
## 131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 ... 9/9/12 9:50

Convert these characters into a Date object

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

Extract the month and the day of the week and add these variables to our data frame mvt.

mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)

replace the old Date variable with DateConvert

mvt$Date = DateConvert

In which month did the fewest motor vehicle thefts occur: Feb.

table(mvt$Month)
## 
##     April    August  December  February   January      July      June 
##     15280     16572     16426     13511     16047     16801     16002 
##     March       May  November   October September 
##     15758     16035     16063     17086     16060

On which weekday did the most motor vehicle thefts occur: Friday

table(mvt$Weekday)
## 
##    Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday 
##     29284     27397     27118     26316     27319     26791     27416

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made: Jan.

table(mvt$Arrest, mvt$Month)
##        
##         April August December February January  July  June March   May
##   FALSE 14028  15243    15029    12273   14612 15477 14772 14460 14848
##   TRUE   1252   1329     1397     1238    1435  1324  1230  1298  1187
##        
##         November October September
##   FALSE    14807   15744     14812
##   TRUE      1256    1342      1248

make a histogram

hist(mvt$Date, breaks=100)

boxplot(mvt$Date ~ mvt$Arrest)

what proportion of motor vehicle thefts in 2001 was an arrest made:

table(mvt$Arrest, mvt$Year)
##        
##          2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011
##   FALSE 18517 16638 14859 15169 14956 14796 13068 13425 11327 14796 15012
##   TRUE   2152  2115  1798  1693  1528  1302  1212  1020   840   701   625
##        
##          2012
##   FALSE 13542
##   TRUE    550

If you create a table of the LocationDescription variable, it is unfortunately very hard to read since there are 78 different locations in the data set. By using the sort function, we can view this same table, but sorted by the number of observations in each category

sort(table(mvt$LocationDescription))
## 
##     AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
##                                               1 
##                  AIRPORT EXTERIOR - SECURE AREA 
##                                               1 
##                                 ANIMAL HOSPITAL 
##                                               1 
##                                 APPLIANCE STORE 
##                                               1 
##                                       CTA TRAIN 
##                                               1 
##                         JAIL / LOCK-UP FACILITY 
##                                               1 
##                                       NEWSSTAND 
##                                               1 
##                                          BRIDGE 
##                                               2 
##               COLLEGE/UNIVERSITY RESIDENCE HALL 
##                                               2 
##                               CURRENCY EXCHANGE 
##                                               2 
##                                   BOWLING ALLEY 
##                                               3 
##                                  CLEANING STORE 
##                                               3 
##                           MEDICAL/DENTAL OFFICE 
##                                               3 
##                              ABANDONED BUILDING 
##                                               4 
## AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA 
##                                               4 
##                                      BARBERSHOP 
##                                               4 
##                  LAKEFRONT/WATERFRONT/RIVERBANK 
##                                               4 
##                                         LIBRARY 
##                                               4 
##                                SAVINGS AND LOAN 
##                                               4 
##  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA 
##                                               5 
##                                   CHA APARTMENT 
##                                               5 
##                                 DAY CARE CENTER 
##                                               5 
##                                    FIRE STATION 
##                                               5 
##                                 FOREST PRESERVE 
##                                               6 
##                                            BANK 
##                                               7 
##                               CONVENIENCE STORE 
##                                               7 
##                                      DRUG STORE 
##                                               8 
##                 OTHER COMMERCIAL TRANSPORTATION 
##                                               8 
##                                   ATHLETIC CLUB 
##                                               9 
##                   AIRPORT VENDING ESTABLISHMENT 
##                                              10 
##                             AIRPORT PARKING LOT 
##                                              11 
##                       SCHOOL, PRIVATE, BUILDING 
##                                              14 
##                             TAVERN/LIQUOR STORE 
##                                              14 
##                  FACTORY/MANUFACTURING BUILDING 
##                                              16 
##                                   BAR OR TAVERN 
##                                              17 
##                                       WAREHOUSE 
##                                              17 
##                             MOVIE HOUSE/THEATER 
##                                              18 
##                         RESIDENCE PORCH/HALLWAY 
##                                              18 
##                    NURSING HOME/RETIREMENT HOME 
##                                              21 
##                                         TAXICAB 
##                                              21 
##                                DEPARTMENT STORE 
##                                              22 
##                              HIGHWAY/EXPRESSWAY 
##                                              22 
##                        SCHOOL, PRIVATE, GROUNDS 
##                                              23 
##                              VEHICLE-COMMERCIAL 
##                                              23 
##              AIRPORT EXTERIOR - NON-SECURE AREA 
##                                              24 
##               OTHER RAILROAD PROP / TRAIN DEPOT 
##                                              28 
##                              SMALL RETAIL STORE 
##                                              33 
##                               CONSTRUCTION SITE 
##                                              35 
##                                        CAR WASH 
##                                              44 
##                      COLLEGE/UNIVERSITY GROUNDS 
##                                              47 
##                    GOVERNMENT BUILDING/PROPERTY 
##                                              48 
##                                      RESTAURANT 
##                                              49 
##               CHURCH/SYNAGOGUE/PLACE OF WORSHIP 
##                                              56 
##                              GROCERY FOOD STORE 
##                                              80 
##                       HOSPITAL BUILDING/GROUNDS 
##                                             101 
##                        SCHOOL, PUBLIC, BUILDING 
##                                             114 
##                                     HOTEL/MOTEL 
##                                             124 
##                    COMMERCIAL / BUSINESS OFFICE 
##                                             126 
##                     CTA GARAGE / OTHER PROPERTY 
##                                             148 
##                            SPORTS ARENA/STADIUM 
##                                             166 
##                                       APARTMENT 
##                                             184 
##                         SCHOOL, PUBLIC, GROUNDS 
##                                             206 
##                                   PARK PROPERTY 
##                                             255 
##                 POLICE FACILITY/VEH PARKING LOT 
##                                             266 
##                                AIRPORT/AIRCRAFT 
##                                             363 
##                         CHA PARKING LOT/GROUNDS 
##                                             405 
##                                        SIDEWALK 
##                                             462 
##                          VEHICLE NON-COMMERCIAL 
##                                             817 
##                                 VACANT LOT/LAND 
##                                             985 
##                                RESIDENCE-GARAGE 
##                                            1176 
##                                       RESIDENCE 
##                                            1302 
##                   RESIDENTIAL YARD (FRONT/BACK) 
##                                            1536 
##                          DRIVEWAY - RESIDENTIAL 
##                                            1675 
##                                     GAS STATION 
##                                            2111 
##                                           ALLEY 
##                                            2308 
##                                           OTHER 
##                                            4573 
##                  PARKING LOT/GARAGE(NON.RESID.) 
##                                           14852 
##                                          STREET 
##                                          156564

Create a subset of your data, only taking observations for which the theft happened in one of these five locations

Top5 = subset(mvt, LocationDescription=="STREET" | LocationDescription=="PARKING LOT/GARAGE(NON.RESID.)" | LocationDescription=="ALLEY" | LocationDescription=="GAS STATION" | LocationDescription=="DRIVEWAY - RESIDENTIAL")
Top5$LocationDescription = factor(Top5$LocationDescription)
str(Top5)
## 'data.frame':    177510 obs. of  13 variables:
##  $ ID                 : int  8951354 8951141 8952223 8951608 8950793 8950760 8951611 8951802 8950706 8951585 ...
##  $ Date               : Date, format: "2012-12-31" "2012-12-31" ...
##  $ LocationDescription: Factor w/ 5 levels "ALLEY","DRIVEWAY - RESIDENTIAL",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 724 211 2521 423 231 1021 1215 1011 ...
##  $ District           : int  6 12 7 2 25 4 2 10 12 10 ...
##  $ CommunityArea      : int  69 24 67 35 19 48 40 29 24 29 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 41.8 41.8 41.9 ...
##  $ Longitude          : num  -87.6 -87.7 -87.7 -87.6 -87.8 ...
##  $ Month              : chr  "December" "December" "December" "December" ...
##  $ Weekday            : chr  "Monday" "Monday" "Monday" "Monday" ...
table(Top5$LocationDescription, Top5$Arrest)
##                                 
##                                   FALSE   TRUE
##   ALLEY                            2059    249
##   DRIVEWAY - RESIDENTIAL           1543    132
##   GAS STATION                      1672    439
##   PARKING LOT/GARAGE(NON.RESID.)  13249   1603
##   STREET                         144969  11595