Section 1 - Loading the Data

1.1

How many rows of data (observations) are in this dataset?

mvt = read.csv("Unit1/mvtWeek1.csv")
nrow(mvt)
[1] 191641

1.2

How many variables are in this dataset?

str(mvt)
'data.frame':   191641 obs. of  11 variables:
 $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
 $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
 $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
 $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
 $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
 $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
 $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
 $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
 $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

1.3

Using the “max” function, what is the maximum value of the variable “ID”?

max(mvt$ID)
[1] 9181151

1.4

What is the minimum value of the variable “Beat”?

min(mvt$Beat)
[1] 111

1.5

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

summary(mvt)
       ID                      Date       
 Min.   :1310022   5/16/08 0:00  :    11  
 1st Qu.:2832144   10/17/01 22:00:    10  
 Median :4762956   4/13/04 21:00 :    10  
 Mean   :4968629   9/17/05 22:00 :    10  
 3rd Qu.:7201878   10/12/01 22:00:     9  
 Max.   :9181151   10/13/01 22:00:     9  
                   (Other)       :191582  
                     LocationDescription
 STREET                        :156564  
 PARKING LOT/GARAGE(NON.RESID.): 14852  
 OTHER                         :  4573  
 ALLEY                         :  2308  
 GAS STATION                   :  2111  
 DRIVEWAY - RESIDENTIAL        :  1675  
 (Other)                       :  9558  
   Arrest         Domestic            Beat     
 Mode :logical   Mode :logical   Min.   : 111  
 FALSE:176105    FALSE:191226    1st Qu.: 722  
 TRUE :15536     TRUE :415       Median :1121  
                                 Mean   :1259  
                                 3rd Qu.:1733  
                                 Max.   :2535  
                                               
    District     CommunityArea        Year     
 Min.   : 1.00   Min.   : 0      Min.   :2001  
 1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
 Median :10.00   Median :32      Median :2006  
 Mean   :11.82   Mean   :38      Mean   :2006  
 3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
 Max.   :31.00   Max.   :77      Max.   :2012  
 NA's   :43056   NA's   :24616                 
    Latitude       Longitude     
 Min.   :41.64   Min.   :-87.93  
 1st Qu.:41.77   1st Qu.:-87.72  
 Median :41.85   Median :-87.68  
 Mean   :41.84   Mean   :-87.68  
 3rd Qu.:41.92   3rd Qu.:-87.64  
 Max.   :42.02   Max.   :-87.52  
 NA's   :2276    NA's   :2276    

1.6

How many observations have a LocationDescription value of ALLEY?

summary(mvt)
       ID                      Date       
 Min.   :1310022   5/16/08 0:00  :    11  
 1st Qu.:2832144   10/17/01 22:00:    10  
 Median :4762956   4/13/04 21:00 :    10  
 Mean   :4968629   9/17/05 22:00 :    10  
 3rd Qu.:7201878   10/12/01 22:00:     9  
 Max.   :9181151   10/13/01 22:00:     9  
                   (Other)       :191582  
                     LocationDescription
 STREET                        :156564  
 PARKING LOT/GARAGE(NON.RESID.): 14852  
 OTHER                         :  4573  
 ALLEY                         :  2308  
 GAS STATION                   :  2111  
 DRIVEWAY - RESIDENTIAL        :  1675  
 (Other)                       :  9558  
   Arrest         Domestic            Beat     
 Mode :logical   Mode :logical   Min.   : 111  
 FALSE:176105    FALSE:191226    1st Qu.: 722  
 TRUE :15536     TRUE :415       Median :1121  
                                 Mean   :1259  
                                 3rd Qu.:1733  
                                 Max.   :2535  
                                               
    District     CommunityArea        Year     
 Min.   : 1.00   Min.   : 0      Min.   :2001  
 1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
 Median :10.00   Median :32      Median :2006  
 Mean   :11.82   Mean   :38      Mean   :2006  
 3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
 Max.   :31.00   Max.   :77      Max.   :2012  
 NA's   :43056   NA's   :24616                 
    Latitude       Longitude     
 Min.   :41.64   Min.   :-87.93  
 1st Qu.:41.77   1st Qu.:-87.72  
 Median :41.85   Median :-87.68  
 Mean   :41.84   Mean   :-87.68  
 3rd Qu.:41.92   3rd Qu.:-87.64  
 Max.   :42.02   Max.   :-87.52  
 NA's   :2276    NA's   :2276    

Section 2 - Understanding Dates in R

In many datasets, like this one, you have a date field. Unfortunately, R does not automatically recognize entries that look like dates. We need to use a function in R to extract the date and time. Take a look at the first entry of Date (remember to use square brackets when looking at a certain entry of a variable).

2.1

In what format are the entries in the variable Date?

  • Month/Day/Year Hour:Minute
  • Day/Month/Year Hour:Minute
  • Hour:Minute Month/Day/Year
  • Hour:Minute Day/Month/Year
summary(mvt)
       ID                      Date       
 Min.   :1310022   5/16/08 0:00  :    11  
 1st Qu.:2832144   10/17/01 22:00:    10  
 Median :4762956   4/13/04 21:00 :    10  
 Mean   :4968629   9/17/05 22:00 :    10  
 3rd Qu.:7201878   10/12/01 22:00:     9  
 Max.   :9181151   10/13/01 22:00:     9  
                   (Other)       :191582  
                     LocationDescription
 STREET                        :156564  
 PARKING LOT/GARAGE(NON.RESID.): 14852  
 OTHER                         :  4573  
 ALLEY                         :  2308  
 GAS STATION                   :  2111  
 DRIVEWAY - RESIDENTIAL        :  1675  
 (Other)                       :  9558  
   Arrest         Domestic            Beat     
 Mode :logical   Mode :logical   Min.   : 111  
 FALSE:176105    FALSE:191226    1st Qu.: 722  
 TRUE :15536     TRUE :415       Median :1121  
                                 Mean   :1259  
                                 3rd Qu.:1733  
                                 Max.   :2535  
                                               
    District     CommunityArea        Year     
 Min.   : 1.00   Min.   : 0      Min.   :2001  
 1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
 Median :10.00   Median :32      Median :2006  
 Mean   :11.82   Mean   :38      Mean   :2006  
 3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
 Max.   :31.00   Max.   :77      Max.   :2012  
 NA's   :43056   NA's   :24616                 
    Latitude       Longitude     
 Min.   :41.64   Min.   :-87.93  
 1st Qu.:41.77   1st Qu.:-87.72  
 Median :41.85   Median :-87.68  
 Mean   :41.84   Mean   :-87.68  
 3rd Qu.:41.92   3rd Qu.:-87.64  
 Max.   :42.02   Max.   :-87.52  
 NA's   :2276    NA's   :2276    

2.2

Now, let’s convert these characters into a Date object in R. In your R console, type

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset? Enter your answer as “Month Year”, without the quotes. (Ex: if the answer was 2008-03-28, you would give the answer “March 2008”, without the quotes.)

median(DateConvert)
[1] "2006-05-21"

2.3

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

mvt$Date = DateConvert

Using the table command, answer the following questions.

In which month did the fewest motor vehicle thefts occur?

table(mvt$Month)

 10月  11月  12月   1月   2月   3月   4月   5月   6月 
17086 16063 16426 16047 13511 15758 15280 16035 16002 
  7月   8月   9月 
16801 16572 16060 

2.4

On which weekday did the most motor vehicle thefts occur?

table(mvt$Weekday)

 周二  周六  周日  周三  周四  周五  周一 
26791 27118 26316 27416 27319 29284 27397 

2.5

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made?

table(mvt$Arrest,mvt$Month)
       
         10月  11月  12月   1月   2月   3月   4月
  FALSE 15744 14807 15029 14612 12273 14460 14028
  TRUE   1342  1256  1397  1435  1238  1298  1252
       
          5月   6月   7月   8月   9月
  FALSE 14848 14772 15477 15243 14812
  TRUE   1187  1230  1324  1329  1248
