Section 1 - Loading the Data

1.1

How many rows of data (observations) are in this dataset?

mvt <- read.csv("mvtWeek1.csv")
nrow(mvt)
## [1] 191641

1.2

How many variables are in this dataset?

ncol(mvt)
## [1] 11

1.3

Using the “max” function, what is the maximum value of the variable “ID”?

max(mvt[,1])
## [1] 9181151

1.4

What is the minimum value of the variable “Beat”?

min(mvt[,6])
## [1] 111

1.5

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

length(which(mvt$Arrest=="TRUE"))
## [1] 15536

1.6

How many observations have a LocationDescription value of ALLEY?

length(which(mvt$LocationDescription=="ALLEY"))
## [1] 2308

Section 2 - Understanding Dates in R

In many datasets, like this one, you have a date field. Unfortunately, R does not automatically recognize entries that look like dates. We need to use a function in R to extract the date and time. Take a look at the first entry of Date (remember to use square brackets when looking at a certain entry of a variable).

2.1

In what format are the entries in the variable Date?

  • Month/Day/Year Hour:Minute
  • Day/Month/Year Hour:Minute
  • Hour:Minute Month/Day/Year
  • Hour:Minute Day/Month/Year
mvt$Date[1]
## [1] 12/31/12 23:15
## 131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 ... 9/9/12 9:50

2.2

Now, let’s convert these characters into a Date object in R. In your R console, type

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset? Enter your answer as “Month Year”, without the quotes. (Ex: if the answer was 2008-03-28, you would give the answer “March 2008”, without the quotes.)

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))
summary(DateConvert)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" 
##         Max. 
## "2012-12-31"

2.3

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

mvt$Date = DateConvert

Using the table command, answer the following questions.

In which month did the fewest motor vehicle thefts occur?

mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)
table(mvt$Month)
## 
##   一月   七月   九月   二月   八月 十一月 十二月   十月   三月   五月 
##  16047  16801  16060  13511  16572  16063  16426  17086  15758  16035 
##   六月   四月 
##  16002  15280

2.4

On which weekday did the most motor vehicle thefts occur?

mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)
table(mvt$Month)
## 
##   一月   七月   九月   二月   八月 十一月 十二月   十月   三月   五月 
##  16047  16801  16060  13511  16572  16063  16426  17086  15758  16035 
##   六月   四月 
##  16002  15280

2.5

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made?

table(mvt$Arrest,mvt$Month)
##        
##          一月  七月  九月  二月  八月 十一月 十二月  十月  三月  五月
##   FALSE 14612 15477 14812 12273 15243  14807  15029 15744 14460 14848
##   TRUE   1435  1324  1248  1238  1329   1256   1397  1342  1298  1187
##        
##          六月  四月
##   FALSE 14772 14028
##   TRUE   1230  1252