Background Information on the Dataset

Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward online.

Chicago is the third most populous city in the United States, with a population of over 2.7 million people. The city of Chicago is shown in the map below, with the state of Illinois highlighted in red.

There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.

Please download the file mvtWeek1.csv. for this problem (do not open this file in any spreadsheet software before completing this problem because it might change the format of the Date field). Here is a list of descriptions of the variables:

R Exercises

Read the dataset mvtWeek1.csv. into R, using the read.csv function, and call the data frame “mvt”. Remember to navigate to the directory on your computer containing the file mvtWeek1.csv first. It may take a few minutes to read in the data, since it is pretty large. Then, use the str and summary functions to answer the following questions.

mvt = read.csv("mvtWeek1.csv")

How many rows of data (observations) are in this dataset?

str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

Explanation: If you type str(mvt) in the R console, the first row of output says that this is a data frame with 191,641 observations.

How many variables are in this dataset?

str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

Explanation: If you type str(mvt) in the R console, the first row of output says that this is a data frame with 11 variables.

Using the “max” function, what is the maximum value of the variable “ID”?

max(mvt$ID)
## [1] 9181151

Explanation: You can compute the maximum value of the ID variable with max(mvt$ID).

What is the minimum value of the variable “Beat”?

summary(mvt) 
##        ID                      Date                            LocationDescription   Arrest         Domestic            Beat         District    
##  Min.   :1310022   5/16/08 0:00  :    11   STREET                        :156564   Mode :logical   Mode :logical   Min.   : 111   Min.   : 1.00  
##  1st Qu.:2832144   10/17/01 22:00:    10   PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226    1st Qu.: 722   1st Qu.: 6.00  
##  Median :4762956   4/13/04 21:00 :    10   OTHER                         :  4573   TRUE :15536     TRUE :415       Median :1121   Median :10.00  
##  Mean   :4968629   9/17/05 22:00 :    10   ALLEY                         :  2308                                   Mean   :1259   Mean   :11.82  
##  3rd Qu.:7201878   10/12/01 22:00:     9   GAS STATION                   :  2111                                   3rd Qu.:1733   3rd Qu.:17.00  
##  Max.   :9181151   10/13/01 22:00:     9   DRIVEWAY - RESIDENTIAL        :  1675                                   Max.   :2535   Max.   :31.00  
##                    (Other)       :191582   (Other)                       :  9558                                                  NA's   :43056  
##  CommunityArea        Year         Latitude       Longitude     
##  Min.   : 0      Min.   :2001   Min.   :41.64   Min.   :-87.93  
##  1st Qu.:22      1st Qu.:2003   1st Qu.:41.77   1st Qu.:-87.72  
##  Median :32      Median :2006   Median :41.85   Median :-87.68  
##  Mean   :38      Mean   :2006   Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:60      3rd Qu.:2009   3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :77      Max.   :2012   Max.   :42.02   Max.   :-87.52  
##  NA's   :24616                  NA's   :2276    NA's   :2276

min(mvt$Beat)
## [1] 111

Explanation: If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that the minimum value of Beat is 111. Alternatively, you could use the min function by typing min(mvt$Beat).

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

summary(mvt) 
##        ID                      Date                            LocationDescription   Arrest         Domestic            Beat         District    
##  Min.   :1310022   5/16/08 0:00  :    11   STREET                        :156564   Mode :logical   Mode :logical   Min.   : 111   Min.   : 1.00  
##  1st Qu.:2832144   10/17/01 22:00:    10   PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226    1st Qu.: 722   1st Qu.: 6.00  
##  Median :4762956   4/13/04 21:00 :    10   OTHER                         :  4573   TRUE :15536     TRUE :415       Median :1121   Median :10.00  
##  Mean   :4968629   9/17/05 22:00 :    10   ALLEY                         :  2308                                   Mean   :1259   Mean   :11.82  
##  3rd Qu.:7201878   10/12/01 22:00:     9   GAS STATION                   :  2111                                   3rd Qu.:1733   3rd Qu.:17.00  
##  Max.   :9181151   10/13/01 22:00:     9   DRIVEWAY - RESIDENTIAL        :  1675                                   Max.   :2535   Max.   :31.00  
##                    (Other)       :191582   (Other)                       :  9558                                                  NA's   :43056  
##  CommunityArea        Year         Latitude       Longitude     
##  Min.   : 0      Min.   :2001   Min.   :41.64   Min.   :-87.93  
##  1st Qu.:22      1st Qu.:2003   1st Qu.:41.77   1st Qu.:-87.72  
##  Median :32      Median :2006   Median :41.85   Median :-87.68  
##  Mean   :38      Mean   :2006   Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:60      3rd Qu.:2009   3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :77      Max.   :2012   Max.   :42.02   Max.   :-87.52  
##  NA's   :24616                  NA's   :2276    NA's   :2276

Explanation: If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 15,536 observations fall under the category TRUE for the variable Arrest.

How many observations have a LocationDescription value of ALLEY?

summary(mvt) 
##        ID                      Date                            LocationDescription   Arrest         Domestic            Beat         District    
##  Min.   :1310022   5/16/08 0:00  :    11   STREET                        :156564   Mode :logical   Mode :logical   Min.   : 111   Min.   : 1.00  
##  1st Qu.:2832144   10/17/01 22:00:    10   PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226    1st Qu.: 722   1st Qu.: 6.00  
##  Median :4762956   4/13/04 21:00 :    10   OTHER                         :  4573   TRUE :15536     TRUE :415       Median :1121   Median :10.00  
##  Mean   :4968629   9/17/05 22:00 :    10   ALLEY                         :  2308                                   Mean   :1259   Mean   :11.82  
##  3rd Qu.:7201878   10/12/01 22:00:     9   GAS STATION                   :  2111                                   3rd Qu.:1733   3rd Qu.:17.00  
##  Max.   :9181151   10/13/01 22:00:     9   DRIVEWAY - RESIDENTIAL        :  1675                                   Max.   :2535   Max.   :31.00  
##                    (Other)       :191582   (Other)                       :  9558                                                  NA's   :43056  
##  CommunityArea        Year         Latitude       Longitude     
##  Min.   : 0      Min.   :2001   Min.   :41.64   Min.   :-87.93  
##  1st Qu.:22      1st Qu.:2003   1st Qu.:41.77   1st Qu.:-87.72  
##  Median :32      Median :2006   Median :41.85   Median :-87.68  
##  Mean   :38      Mean   :2006   Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:60      3rd Qu.:2009   3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :77      Max.   :2012   Max.   :42.02   Max.   :-87.52  
##  NA's   :24616                  NA's   :2276    NA's   :2276

table(mvt$LocationDescription)
## 
##                              ABANDONED BUILDING AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA     AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
##                                               4                                               4                                               1 
##              AIRPORT EXTERIOR - NON-SECURE AREA                  AIRPORT EXTERIOR - SECURE AREA                             AIRPORT PARKING LOT 
##                                              24                                               1                                              11 
##  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA                   AIRPORT VENDING ESTABLISHMENT                                AIRPORT/AIRCRAFT 
##                                               5                                              10                                             363 
##                                           ALLEY                                 ANIMAL HOSPITAL                                       APARTMENT 
##                                            2308                                               1                                             184 
##                                 APPLIANCE STORE                                   ATHLETIC CLUB                                            BANK 
##                                               1                                               9                                               7 
##                                   BAR OR TAVERN                                      BARBERSHOP                                   BOWLING ALLEY 
##                                              17                                               4                                               3 
##                                          BRIDGE                                        CAR WASH                                   CHA APARTMENT 
##                                               2                                              44                                               5 
##                         CHA PARKING LOT/GROUNDS               CHURCH/SYNAGOGUE/PLACE OF WORSHIP                                  CLEANING STORE 
##                                             405                                              56                                               3 
##                      COLLEGE/UNIVERSITY GROUNDS               COLLEGE/UNIVERSITY RESIDENCE HALL                    COMMERCIAL / BUSINESS OFFICE 
##                                              47                                               2                                             126 
##                               CONSTRUCTION SITE                               CONVENIENCE STORE                     CTA GARAGE / OTHER PROPERTY 
##                                              35                                               7                                             148 
##                                       CTA TRAIN                               CURRENCY EXCHANGE                                 DAY CARE CENTER 
##                                               1                                               2                                               5 
##                                DEPARTMENT STORE                          DRIVEWAY - RESIDENTIAL                                      DRUG STORE 
##                                              22                                            1675                                               8 
##                  FACTORY/MANUFACTURING BUILDING                                    FIRE STATION                                 FOREST PRESERVE 
##                                              16                                               5                                               6 
##                                     GAS STATION                    GOVERNMENT BUILDING/PROPERTY                              GROCERY FOOD STORE 
##                                            2111                                              48                                              80 
##                              HIGHWAY/EXPRESSWAY                       HOSPITAL BUILDING/GROUNDS                                     HOTEL/MOTEL 
##                                              22                                             101                                             124 
##                         JAIL / LOCK-UP FACILITY                  LAKEFRONT/WATERFRONT/RIVERBANK                                         LIBRARY 
##                                               1                                               4                                               4 
##                           MEDICAL/DENTAL OFFICE                             MOVIE HOUSE/THEATER                                       NEWSSTAND 
##                                               3                                              18                                               1 
##                    NURSING HOME/RETIREMENT HOME                                           OTHER                 OTHER COMMERCIAL TRANSPORTATION 
##                                              21                                            4573                                               8 
##               OTHER RAILROAD PROP / TRAIN DEPOT                                   PARK PROPERTY                  PARKING LOT/GARAGE(NON.RESID.) 
##                                              28                                             255                                           14852 
##                 POLICE FACILITY/VEH PARKING LOT                                       RESIDENCE                                RESIDENCE-GARAGE 
##                                             266                                            1302                                            1176 
##                         RESIDENCE PORCH/HALLWAY                   RESIDENTIAL YARD (FRONT/BACK)                                      RESTAURANT 
##                                              18                                            1536                                              49 
##                                SAVINGS AND LOAN                       SCHOOL, PRIVATE, BUILDING                        SCHOOL, PRIVATE, GROUNDS 
##                                               4                                              14                                              23 
##                        SCHOOL, PUBLIC, BUILDING                         SCHOOL, PUBLIC, GROUNDS                                        SIDEWALK 
##                                             114                                             206                                             462 
##                              SMALL RETAIL STORE                            SPORTS ARENA/STADIUM                                          STREET 
##                                              33                                             166                                          156564 
##                             TAVERN/LIQUOR STORE                                         TAXICAB                                 VACANT LOT/LAND 
##                                              14                                              21                                             985 
##                              VEHICLE-COMMERCIAL                          VEHICLE NON-COMMERCIAL                                       WAREHOUSE 
##                                              23                                             817                                              17

Explanation: If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 2,308 observations fall under the category ALLEY for the variable LocationDescription. You can also read this from table(mvt$LocationDescription).

In what format are the entries in the variable Date?

mvt$Date[1] 
## [1] 12/31/12 23:15
## 131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 1/1/01 1:50 1/1/01 10:00 1/1/01 10:12 1/1/01 11:00 1/1/01 12:00 1/1/01 13:00 ... 9/9/12 9:50

Explanation: If you type mvt$Date[1] in your R console, you can see that the first entry is 12/31/12 23:15. This must be in the format Month/Day/Year Hour:Minute.

Now, let’s convert these characters into a Date object in R. In your R console, type

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset?

summary(DateConvert)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" "2012-12-31"

Explanation: If you type summary(DateConvert), you can see that the median date is 2006-05-21.

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

mvt$Date = DateConvert

In which month did the fewest motor vehicle thefts occur?

table(mvt$Month) 
## 
##     April    August  December  February   January      July      June     March       May  November   October September 
##     15280     16572     16426     13511     16047     16801     16002     15758     16035     16063     17086     16060

Explanation: If you type table(mvt$Month), you can see that the month with the smallest number of observations is February.

On which weekday did the most motor vehicle thefts occur?

table(mvt$Weekday)
## 
##    Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday 
##     29284     27397     27118     26316     27319     26791     27416

Explanation: If you type table(mvt$Weekday), you can see that the weekday with the largest number of observations is Friday.

Which month has the largest number of motor vehicle thefts for which an arrest was made?

table(mvt$Arrest,mvt$Month)
##        
##         April August December February January  July  June March   May November October September
##   FALSE 14028  15243    15029    12273   14612 15477 14772 14460 14848    14807   15744     14812
##   TRUE   1252   1329     1397     1238    1435  1324  1230  1298  1187     1256    1342      1248

Explanation: If you type table(mvt$Arrest,mvt$Month), you can see that the largest number of observations with Arrest=TRUE occurs in the month of January.