Section 1 - Loading the Data

1.1

How many rows of data (observations) are in this dataset?

nrow(D)
[1] 191641

1.2

How many variables are in this dataset?

ncol(D)
[1] 11

1.3

Using the “max” function, what is the maximum value of the variable “ID”?

max(D$ID)
[1] 9181151

1.4

What is the minimum value of the variable “Beat”?

min(D$Beat)
[1] 111

1.5

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

summary(D$Arrest) #15536
   Mode   FALSE    TRUE 
logical  176105   15536 

1.6

How many observations have a LocationDescription value of ALLEY?

summary(D$LocationDescription == "ALLEY") #2308
   Mode   FALSE    TRUE 
logical  189333    2308 

Section 2 - Understanding Dates in R

In many datasets, like this one, you have a date field. Unfortunately, R does not automatically recognize entries that look like dates. We need to use a function in R to extract the date and time. Take a look at the first entry of Date (remember to use square brackets when looking at a certain entry of a variable).

2.1

In what format are the entries in the variable Date?

  • Month/Day/Year Hour:Minute
  • Day/Month/Year Hour:Minute
  • Hour:Minute Month/Day/Year
  • Hour:Minute Day/Month/Year
head(D$Date) #Month/Day/Year Hour:Minute
[1] 12/31/12 23:15 12/31/12 22:00 12/31/12 22:00 12/31/12 22:00
[5] 12/31/12 21:30 12/31/12 20:30
131680 Levels: 10/10/01 0:00 10/10/01 0:01 10/10/01 0:30 ... 9/9/12 9:50

2.2

Now, let’s convert these characters into a Date object in R. In your R console, type

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset? Enter your answer as “Month Year”, without the quotes. (Ex: if the answer was 2008-03-28, you would give the answer “March 2008”, without the quotes.)

DateConvert = as.Date(strptime(D$Date, "%m/%d/%y %H:%M"))
median(DateConvert)
[1] "2006-05-21"
# May 2006

2.3

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

mvt$Date = DateConvert

Using the table command, answer the following questions.

In which month did the fewest motor vehicle thefts occur?

D$Month = months(DateConvert)
D$Weekday = weekdays(DateConvert)
D$Date = DateConvert
table(D$Month) #February 13511

    April    August  December  February   January      July      June 
    15280     16572     16426     13511     16047     16801     16002 
    March       May  November   October September 
    15758     16035     16063     17086     16060 

2.4

On which weekday did the most motor vehicle thefts occur?

table(D$Weekday) #Sunday 26316

   Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday 
    29284     27397     27118     26316     27319     26791     27416 

2.5

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made?

table(D$Month,D$Arrest) #January 1435
           
            FALSE  TRUE
  April     14028  1252
  August    15243  1329
  December  15029  1397
  February  12273  1238
  January   14612  1435
  July      15477  1324
  June      14772  1230
  March     14460  1298
  May       14848  1187
  November  14807  1256
  October   15744  1342
  September 14812  1248
