Background Information on the Dataset

Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward online.

Chicago is the third most populous city in the United States, with a population of over 2.7 million people. The city of Chicago is shown in the map below, with the state of Illinois highlighted in red.

There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.

Please download the file mvtWeek1.csv. for this problem (do not open this file in any spreadsheet software before completing this problem because it might change the format of the Date field). Here is a list of descriptions of the variables:

R Exercises

Read the dataset mvtWeek1.csv. into R, using the read.csv function, and call the data frame “mvt”. Remember to navigate to the directory on your computer containing the file mvtWeek1.csv first. It may take a few minutes to read in the data, since it is pretty large. Then, use the str and summary functions to answer the following questions.

# Load the dataset
mvt = read.csv("mvtWeek1.csv")

How many rows of data (observations) are in this dataset?

# Output the string
str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

If you type str(mvt) in the R console, the first row of output says that this is a data frame with 191,641 observations.

How many variables are in this dataset?

# Output the string
str(mvt)
## 'data.frame':    191641 obs. of  11 variables:
##  $ ID                 : int  8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
##  $ Date               : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
##  $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
##  $ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Beat               : int  623 1213 1622 724 211 2521 423 231 1021 1215 ...
##  $ District           : int  6 12 16 7 2 25 4 2 10 12 ...
##  $ CommunityArea      : int  69 24 11 67 35 19 48 40 29 24 ...
##  $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ Latitude           : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude          : num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

If you type str(mvt) in the R console, the first row of output says that this is a data frame with 11 variables.

Using the “max” function, what is the maximum value of the variable “ID”?

# Find the maximum value 
max(mvt$ID)
## [1] 9181151

You can compute the maximum value of the ID variable with max(mvt$ID).

What is the minimum value of the variable “Beat”?

# Output the summary
z = summary(mvt) 
kable(z)
ID Date LocationDescription Arrest Domestic Beat District CommunityArea Year Latitude Longitude
Min. :1310022 5/16/08 0:00 : 11 STREET :156564 Mode :logical Mode :logical Min. : 111 Min. : 1.00 Min. : 0 Min. :2001 Min. :41.64 Min. :-87.93
1st Qu.:2832144 10/17/01 22:00: 10 PARKING LOT/GARAGE(NON.RESID.): 14852 FALSE:176105 FALSE:191226 1st Qu.: 722 1st Qu.: 6.00 1st Qu.:22 1st Qu.:2003 1st Qu.:41.77 1st Qu.:-87.72
Median :4762956 4/13/04 21:00 : 10 OTHER : 4573 TRUE :15536 TRUE :415 Median :1121 Median :10.00 Median :32 Median :2006 Median :41.85 Median :-87.68
Mean :4968629 9/17/05 22:00 : 10 ALLEY : 2308 NA NA Mean :1259 Mean :11.82 Mean :38 Mean :2006 Mean :41.84 Mean :-87.68
3rd Qu.:7201878 10/12/01 22:00: 9 GAS STATION : 2111 NA NA 3rd Qu.:1733 3rd Qu.:17.00 3rd Qu.:60 3rd Qu.:2009 3rd Qu.:41.92 3rd Qu.:-87.64
Max. :9181151 10/13/01 22:00: 9 DRIVEWAY - RESIDENTIAL : 1675 NA NA Max. :2535 Max. :31.00 Max. :77 Max. :2012 Max. :42.02 Max. :-87.52
NA (Other) :191582 (Other) : 9558 NA NA NA NA’s :43056 NA’s :24616 NA NA’s :2276 NA’s :2276
# Calculates the minimum
min(mvt$Beat)
## [1] 111

If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that the minimum value of Beat is 111. Alternatively, you could use the min function by typing min(mvt$Beat).

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

# Output the summary
z = summary(mvt) 
kable(z)
ID Date LocationDescription Arrest Domestic Beat District CommunityArea Year Latitude Longitude
Min. :1310022 5/16/08 0:00 : 11 STREET :156564 Mode :logical Mode :logical Min. : 111 Min. : 1.00 Min. : 0 Min. :2001 Min. :41.64 Min. :-87.93
1st Qu.:2832144 10/17/01 22:00: 10 PARKING LOT/GARAGE(NON.RESID.): 14852 FALSE:176105 FALSE:191226 1st Qu.: 722 1st Qu.: 6.00 1st Qu.:22 1st Qu.:2003 1st Qu.:41.77 1st Qu.:-87.72
Median :4762956 4/13/04 21:00 : 10 OTHER : 4573 TRUE :15536 TRUE :415 Median :1121 Median :10.00 Median :32 Median :2006 Median :41.85 Median :-87.68
Mean :4968629 9/17/05 22:00 : 10 ALLEY : 2308 NA NA Mean :1259 Mean :11.82 Mean :38 Mean :2006 Mean :41.84 Mean :-87.68
3rd Qu.:7201878 10/12/01 22:00: 9 GAS STATION : 2111 NA NA 3rd Qu.:1733 3rd Qu.:17.00 3rd Qu.:60 3rd Qu.:2009 3rd Qu.:41.92 3rd Qu.:-87.64
Max. :9181151 10/13/01 22:00: 9 DRIVEWAY - RESIDENTIAL : 1675 NA NA Max. :2535 Max. :31.00 Max. :77 Max. :2012 Max. :42.02 Max. :-87.52
NA (Other) :191582 (Other) : 9558 NA NA NA NA’s :43056 NA’s :24616 NA NA’s :2276 NA’s :2276

If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 15,536 observations fall under the category TRUE for the variable Arrest.

How many observations have a LocationDescription value of ALLEY?

# Output the summary
z = summary(mvt) 
kable(z)
ID Date LocationDescription Arrest Domestic Beat District CommunityArea Year Latitude Longitude
Min. :1310022 5/16/08 0:00 : 11 STREET :156564 Mode :logical Mode :logical Min. : 111 Min. : 1.00 Min. : 0 Min. :2001 Min. :41.64 Min. :-87.93
1st Qu.:2832144 10/17/01 22:00: 10 PARKING LOT/GARAGE(NON.RESID.): 14852 FALSE:176105 FALSE:191226 1st Qu.: 722 1st Qu.: 6.00 1st Qu.:22 1st Qu.:2003 1st Qu.:41.77 1st Qu.:-87.72
Median :4762956 4/13/04 21:00 : 10 OTHER : 4573 TRUE :15536 TRUE :415 Median :1121 Median :10.00 Median :32 Median :2006 Median :41.85 Median :-87.68
Mean :4968629 9/17/05 22:00 : 10 ALLEY : 2308 NA NA Mean :1259 Mean :11.82 Mean :38 Mean :2006 Mean :41.84 Mean :-87.68
3rd Qu.:7201878 10/12/01 22:00: 9 GAS STATION : 2111 NA NA 3rd Qu.:1733 3rd Qu.:17.00 3rd Qu.:60 3rd Qu.:2009 3rd Qu.:41.92 3rd Qu.:-87.64
Max. :9181151 10/13/01 22:00: 9 DRIVEWAY - RESIDENTIAL : 1675 NA NA Max. :2535 Max. :31.00 Max. :77 Max. :2012 Max. :42.02 Max. :-87.52
NA (Other) :191582 (Other) : 9558 NA NA NA NA’s :43056 NA’s :24616 NA NA’s :2276 NA’s :2276
# Tabulates the division of observations for the LocationDescription variable
table(mvt$LocationDescription)
## 
##                              ABANDONED BUILDING AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA     AIRPORT BUILDING NON-TERMINAL - SECURE AREA              AIRPORT EXTERIOR - NON-SECURE AREA 
##                                               4                                               4                                               1                                              24 
##                  AIRPORT EXTERIOR - SECURE AREA                             AIRPORT PARKING LOT  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA                   AIRPORT VENDING ESTABLISHMENT 
##                                               1                                              11                                               5                                              10 
##                                AIRPORT/AIRCRAFT                                           ALLEY                                 ANIMAL HOSPITAL                                       APARTMENT 
##                                             363                                            2308                                               1                                             184 
##                                 APPLIANCE STORE                                   ATHLETIC CLUB                                            BANK                                   BAR OR TAVERN 
##                                               1                                               9                                               7                                              17 
##                                      BARBERSHOP                                   BOWLING ALLEY                                          BRIDGE                                        CAR WASH 
##                                               4                                               3                                               2                                              44 
##                                   CHA APARTMENT                         CHA PARKING LOT/GROUNDS               CHURCH/SYNAGOGUE/PLACE OF WORSHIP                                  CLEANING STORE 
##                                               5                                             405                                              56                                               3 
##                      COLLEGE/UNIVERSITY GROUNDS               COLLEGE/UNIVERSITY RESIDENCE HALL                    COMMERCIAL / BUSINESS OFFICE                               CONSTRUCTION SITE 
##                                              47                                               2                                             126                                              35 
##                               CONVENIENCE STORE                     CTA GARAGE / OTHER PROPERTY                                       CTA TRAIN                               CURRENCY EXCHANGE 
##                                               7                                             148                                               1                                               2 
##                                 DAY CARE CENTER                                DEPARTMENT STORE                          DRIVEWAY - RESIDENTIAL                                      DRUG STORE 
##                                               5                                              22                                            1675                                               8 
##                  FACTORY/MANUFACTURING BUILDING                                    FIRE STATION                                 FOREST PRESERVE                                     GAS STATION 
##                                              16                                               5                                               6                                            2111 
##                    GOVERNMENT BUILDING/PROPERTY                              GROCERY FOOD STORE                              HIGHWAY/EXPRESSWAY                       HOSPITAL BUILDING/GROUNDS 
##                                              48                                              80                                              22                                             101 
##                                     HOTEL/MOTEL                         JAIL / LOCK-UP FACILITY                  LAKEFRONT/WATERFRONT/RIVERBANK                                         LIBRARY 
##                                             124                                               1                                               4                                               4 
##                           MEDICAL/DENTAL OFFICE                             MOVIE HOUSE/THEATER                                       NEWSSTAND                    NURSING HOME/RETIREMENT HOME 
##                                               3                                              18                                               1                                              21 
##                                           OTHER                 OTHER COMMERCIAL TRANSPORTATION               OTHER RAILROAD PROP / TRAIN DEPOT                                   PARK PROPERTY 
##                                            4573                                               8                                              28                                             255 
##                  PARKING LOT/GARAGE(NON.RESID.)                 POLICE FACILITY/VEH PARKING LOT                                       RESIDENCE                                RESIDENCE-GARAGE 
##                                           14852                                             266                                            1302                                            1176 
##                         RESIDENCE PORCH/HALLWAY                   RESIDENTIAL YARD (FRONT/BACK)                                      RESTAURANT                                SAVINGS AND LOAN 
##                                              18                                            1536                                              49                                               4 
##                       SCHOOL, PRIVATE, BUILDING                        SCHOOL, PRIVATE, GROUNDS                        SCHOOL, PUBLIC, BUILDING                         SCHOOL, PUBLIC, GROUNDS 
##                                              14                                              23                                             114                                             206 
##                                        SIDEWALK                              SMALL RETAIL STORE                            SPORTS ARENA/STADIUM                                          STREET 
##                                             462                                              33                                             166                                          156564 
##                             TAVERN/LIQUOR STORE                                         TAXICAB                                 VACANT LOT/LAND                              VEHICLE-COMMERCIAL 
##                                              14                                              21                                             985                                              23 
##                          VEHICLE NON-COMMERCIAL                                       WAREHOUSE 
##                                             817                                              17

If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 2,308 observations fall under the category ALLEY for the variable LocationDescription. You can also read this from table(mvt$LocationDescription).

In what format are the entries in the variable Date?

# Examine the dataset
mvt$Date[1] 
## [1] 12/31/12 23:15
## 131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 1/1/01 1:50 1/1/01 10:00 1/1/01 10:12 1/1/01 11:00 1/1/01 12:00 1/1/01 13:00 1/1/01 15:00 1/1/01 15:30 1/1/01 16:00 ... 9/9/12 9:50

If you type mvt$Date[1] in your R console, you can see that the first entry is 12/31/12 23:15. This must be in the format Month/Day/Year Hour:Minute.

Now, let’s convert these characters into a Date object in R. In your R console, type

# Convert data into date object
DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset?

# Output the summary
summary(DateConvert)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" "2012-12-31"

If you type summary(DateConvert), you can see that the median date is 2006-05-21.

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

# Extact the month and day of the week and add them to our date frame
mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

# Replace the old variable
mvt$Date = DateConvert

In which month did the fewest motor vehicle thefts occur?

# Tabulate the amount of motor vehicle thefs in each month
z = table(mvt$Month) 
kable(z)
Var1 Freq
April 15280
August 16572
December 16426
February 13511
January 16047
July 16801
June 16002
March 15758
May 16035
November 16063
October 17086
September 16060

If you type table(mvt$Month), you can see that the month with the smallest number of observations is February.

On which weekday did the most motor vehicle thefts occur?

# Tabulates the amount of mvt for each weekday
z = table(mvt$Weekday)
kable(z)
Var1 Freq
Friday 29284
Monday 27397
Saturday 27118
Sunday 26316
Thursday 27319
Tuesday 26791
Wednesday 27416

If you type table(mvt$Weekday), you can see that the weekday with the largest number of observations is Friday.

Which month has the largest number of motor vehicle thefts for which an arrest was made?

# Tabulates the number of mvts which an arrest was made for each month
z = table(mvt$Arrest,mvt$Month)
kable(z)
April August December February January July June March May November October September
FALSE 14028 15243 15029 12273 14612 15477 14772 14460 14848 14807 15744 14812
TRUE 1252 1329 1397 1238 1435 1324 1230 1298 1187 1256 1342 1248

If you type table(mvt$Arrest,mvt$Month), you can see that the largest number of observations with Arrest=TRUE occurs in the month of January.