Section 1 - Loading the Data

1.1

How many rows of data (observations) are in this dataset?

mvt=read.csv("mvtWeek1.csv")
nrow(mvt)
[1] 191641

1.2

How many variables are in this dataset?

mvt=read.csv("mvtWeek1.csv")
str(mvt)
'data.frame':   191641 obs. of  11 variables:
 $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
 $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
 $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
 $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
 $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
 $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
 $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
 $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
 $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

1.3

Using the “max” function, what is the maximum value of the variable “ID”?

max(mvt$ID)
[1] 9181151

1.4

What is the minimum value of the variable “Beat”?

min(mvt$Beat)
[1] 111

1.5

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

summary(mvt)
       ID                      Date                            LocationDescription
 Min.   :1310022   5/16/08 0:00  :    11   STREET                        :156564  
 1st Qu.:2832144   10/17/01 22:00:    10   PARKING LOT/GARAGE(NON.RESID.): 14852  
 Median :4762956   4/13/04 21:00 :    10   OTHER                         :  4573  
 Mean   :4968629   9/17/05 22:00 :    10   ALLEY                         :  2308  
 3rd Qu.:7201878   10/12/01 22:00:     9   GAS STATION                   :  2111  
 Max.   :9181151   10/13/01 22:00:     9   DRIVEWAY - RESIDENTIAL        :  1675  
                   (Other)       :191582   (Other)                       :  9558  
   Arrest         Domestic            Beat         District     CommunityArea        Year     
 Mode :logical   Mode :logical   Min.   : 111   Min.   : 1.00   Min.   : 0      Min.   :2001  
 FALSE:176105    FALSE:191226    1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
 TRUE :15536     TRUE :415       Median :1121   Median :10.00   Median :32      Median :2006  
                                 Mean   :1259   Mean   :11.82   Mean   :38      Mean   :2006  
                                 3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
                                 Max.   :2535   Max.   :31.00   Max.   :77      Max.   :2012  
                                                NA's   :43056   NA's   :24616                 
    Latitude       Longitude     
 Min.   :41.64   Min.   :-87.93  
 1st Qu.:41.77   1st Qu.:-87.72  
 Median :41.85   Median :-87.68  
 Mean   :41.84   Mean   :-87.68  
 3rd Qu.:41.92   3rd Qu.:-87.64  
 Max.   :42.02   Max.   :-87.52  
 NA's   :2276    NA's   :2276    

1.6

How many observations have a LocationDescription value of ALLEY?

summary(mvt)
       ID                      Date                            LocationDescription
 Min.   :1310022   5/16/08 0:00  :    11   STREET                        :156564  
 1st Qu.:2832144   10/17/01 22:00:    10   PARKING LOT/GARAGE(NON.RESID.): 14852  
 Median :4762956   4/13/04 21:00 :    10   OTHER                         :  4573  
 Mean   :4968629   9/17/05 22:00 :    10   ALLEY                         :  2308  
 3rd Qu.:7201878   10/12/01 22:00:     9   GAS STATION                   :  2111  
 Max.   :9181151   10/13/01 22:00:     9   DRIVEWAY - RESIDENTIAL        :  1675  
                   (Other)       :191582   (Other)                       :  9558  
   Arrest         Domestic            Beat         District     CommunityArea        Year     
 Mode :logical   Mode :logical   Min.   : 111   Min.   : 1.00   Min.   : 0      Min.   :2001  
 FALSE:176105    FALSE:191226    1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
 TRUE :15536     TRUE :415       Median :1121   Median :10.00   Median :32      Median :2006  
                                 Mean   :1259   Mean   :11.82   Mean   :38      Mean   :2006  
                                 3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
                                 Max.   :2535   Max.   :31.00   Max.   :77      Max.   :2012  
                                                NA's   :43056   NA's   :24616                 
    Latitude       Longitude     
 Min.   :41.64   Min.   :-87.93  
 1st Qu.:41.77   1st Qu.:-87.72  
 Median :41.85   Median :-87.68  
 Mean   :41.84   Mean   :-87.68  
 3rd Qu.:41.92   3rd Qu.:-87.64  
 Max.   :42.02   Max.   :-87.52  
 NA's   :2276    NA's   :2276    

Section 2 - Understanding Dates in R

In many datasets, like this one, you have a date field. Unfortunately, R does not automatically recognize entries that look like dates. We need to use a function in R to extract the date and time. Take a look at the first entry of Date (remember to use square brackets when looking at a certain entry of a variable).

2.1

In what format are the entries in the variable Date?

  • Month/Day/Year Hour:Minute
  • Day/Month/Year Hour:Minute
  • Hour:Minute Month/Day/Year
  • Hour:Minute Day/Month/Year
mvt$Date[1] 
[1] 12/31/12 23:15
131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 1/1/01 1:50 ... 9/9/12 9:50

2.2

Now, let’s convert these characters into a Date object in R. In your R console, type

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset? Enter your answer as “Month Year”, without the quotes. (Ex: if the answer was 2008-03-28, you would give the answer “March 2008”, without the quotes.)

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))
summary(DateConvert)
        Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
"2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" "2012-12-31" 

2.3

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

mvt$Date = DateConvert

Using the table command, answer the following questions.

In which month did the fewest motor vehicle thefts occur?

mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)
mvt$Date = DateConvert
table(mvt$Month)

  一月   七月   九月   二月   八月 十一月 十二月   十月   三月   五月   六月   四月 
 16047  16801  16060  13511  16572  16063  16426  17086  15758  16035  16002  15280 

2.4

On which weekday did the most motor vehicle thefts occur?

table(mvt$Weekday)

星期一 星期二 星期三 星期五 星期六 星期日 星期四 
 27397  26791  27416  29284  27118  26316  27319 

2.5

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made?

  七月  九月  二月  八月 十一月"
1, 十二月"  十月  三月  五月  六月  四月
  FALSE 14612 15477 14812 12273 15243  14807  15029 15744 14460 14848 14772 14028
  TRUE   1435  1324  1248  1238  1329   1256   1397  1342  1298  1187  1230  1252
