Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward online.
Chicago is the third most populous city in the United States, with a population of over 2.7 million people. The city of Chicago is shown in the map below, with the state of Illinois highlighted in red.
There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.
Please download the file mvtWeek1.csv. for this problem (do not open this file in any spreadsheet software before completing this problem because it might change the format of the Date field). Here is a list of descriptions of the variables:
ID: a unique identifier for each observation
Date: the date the crime occurred
LocationDescription: the location where the crime occurred
Arrest: whether or not an arrest was made for the crime (TRUE if an arrest was made, and FALSE if an arrest was not made) Domestic: whether or not the crime was a domestic crime, meaning that it was committed against a family member (TRUE if it was domestic, and FALSE if it was not domestic) Beat: the area, or “beat” in which the crime occurred. This is the smallest regional division defined by the Chicago police department. District: the police district in which the crime occurred. Each district is composed of many beats, and are defined by the Chicago Police Department.
CommunityArea: the community area in which the crime occurred. Since the 1920s, Chicago has been divided into what are called “community areas”, of which there are now 77. The community areas were devised in an attempt to create socially homogeneous regions.
Year: the year in which the crime occurred.
Latitude: the latitude of the location at which the crime occurred.
Longitude: the longitude of the location at which the crime occurred.
Read the dataset mvtWeek1.csv. into R, using the read.csv function, and call the data frame “mvt”. Remember to navigate to the directory on your computer containing the file mvtWeek1.csv first. It may take a few minutes to read in the data, since it is pretty large. Then, use the str and summary functions to answer the following questions.
# Load the dataset
mvt = read.csv("mvtWeek1.csv")# Output the string
str(mvt)
## 'data.frame': 191641 obs. of 11 variables:
## $ ID : int 8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
## $ Date : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
## $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
## $ Arrest : logi FALSE FALSE FALSE FALSE FALSE TRUE ...
## $ Domestic : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Beat : int 623 1213 1622 724 211 2521 423 231 1021 1215 ...
## $ District : int 6 12 16 7 2 25 4 2 10 12 ...
## $ CommunityArea : int 69 24 11 67 35 19 48 40 29 24 ...
## $ Year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ Latitude : num 41.8 41.9 42 41.8 41.8 ...
## $ Longitude : num -87.6 -87.7 -87.8 -87.7 -87.6 ...If you type str(mvt) in the R console, the first row of output says that this is a data frame with 191,641 observations.
# Output the string
str(mvt)
## 'data.frame': 191641 obs. of 11 variables:
## $ ID : int 8951354 8951141 8952745 8952223 8951608 8950793 8950760 8951611 8951802 8950706 ...
## $ Date : Factor w/ 131680 levels "1/1/01 0:01",..: 42824 42823 42823 42823 42822 42821 42820 42819 42817 42816 ...
## $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 62 72 72 72 72 72 72 72 ...
## $ Arrest : logi FALSE FALSE FALSE FALSE FALSE TRUE ...
## $ Domestic : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Beat : int 623 1213 1622 724 211 2521 423 231 1021 1215 ...
## $ District : int 6 12 16 7 2 25 4 2 10 12 ...
## $ CommunityArea : int 69 24 11 67 35 19 48 40 29 24 ...
## $ Year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ Latitude : num 41.8 41.9 42 41.8 41.8 ...
## $ Longitude : num -87.6 -87.7 -87.8 -87.7 -87.6 ...If you type str(mvt) in the R console, the first row of output says that this is a data frame with 11 variables.
# Find the maximum value
max(mvt$ID)
## [1] 9181151You can compute the maximum value of the ID variable with max(mvt$ID).
# Output the summary
z = summary(mvt)
kable(z)| ID | Date | LocationDescription | Arrest | Domestic | Beat | District | CommunityArea | Year | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. :1310022 | 5/16/08 0:00 : 11 | STREET :156564 | Mode :logical | Mode :logical | Min. : 111 | Min. : 1.00 | Min. : 0 | Min. :2001 | Min. :41.64 | Min. :-87.93 | |
| 1st Qu.:2832144 | 10/17/01 22:00: 10 | PARKING LOT/GARAGE(NON.RESID.): 14852 | FALSE:176105 | FALSE:191226 | 1st Qu.: 722 | 1st Qu.: 6.00 | 1st Qu.:22 | 1st Qu.:2003 | 1st Qu.:41.77 | 1st Qu.:-87.72 | |
| Median :4762956 | 4/13/04 21:00 : 10 | OTHER : 4573 | TRUE :15536 | TRUE :415 | Median :1121 | Median :10.00 | Median :32 | Median :2006 | Median :41.85 | Median :-87.68 | |
| Mean :4968629 | 9/17/05 22:00 : 10 | ALLEY : 2308 | NA | NA | Mean :1259 | Mean :11.82 | Mean :38 | Mean :2006 | Mean :41.84 | Mean :-87.68 | |
| 3rd Qu.:7201878 | 10/12/01 22:00: 9 | GAS STATION : 2111 | NA | NA | 3rd Qu.:1733 | 3rd Qu.:17.00 | 3rd Qu.:60 | 3rd Qu.:2009 | 3rd Qu.:41.92 | 3rd Qu.:-87.64 | |
| Max. :9181151 | 10/13/01 22:00: 9 | DRIVEWAY - RESIDENTIAL : 1675 | NA | NA | Max. :2535 | Max. :31.00 | Max. :77 | Max. :2012 | Max. :42.02 | Max. :-87.52 | |
| NA | (Other) :191582 | (Other) : 9558 | NA | NA | NA | NA’s :43056 | NA’s :24616 | NA | NA’s :2276 | NA’s :2276 |
# Calculates the minimum
min(mvt$Beat)
## [1] 111If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that the minimum value of Beat is 111. Alternatively, you could use the min function by typing min(mvt$Beat).
# Output the summary
z = summary(mvt)
kable(z)| ID | Date | LocationDescription | Arrest | Domestic | Beat | District | CommunityArea | Year | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. :1310022 | 5/16/08 0:00 : 11 | STREET :156564 | Mode :logical | Mode :logical | Min. : 111 | Min. : 1.00 | Min. : 0 | Min. :2001 | Min. :41.64 | Min. :-87.93 | |
| 1st Qu.:2832144 | 10/17/01 22:00: 10 | PARKING LOT/GARAGE(NON.RESID.): 14852 | FALSE:176105 | FALSE:191226 | 1st Qu.: 722 | 1st Qu.: 6.00 | 1st Qu.:22 | 1st Qu.:2003 | 1st Qu.:41.77 | 1st Qu.:-87.72 | |
| Median :4762956 | 4/13/04 21:00 : 10 | OTHER : 4573 | TRUE :15536 | TRUE :415 | Median :1121 | Median :10.00 | Median :32 | Median :2006 | Median :41.85 | Median :-87.68 | |
| Mean :4968629 | 9/17/05 22:00 : 10 | ALLEY : 2308 | NA | NA | Mean :1259 | Mean :11.82 | Mean :38 | Mean :2006 | Mean :41.84 | Mean :-87.68 | |
| 3rd Qu.:7201878 | 10/12/01 22:00: 9 | GAS STATION : 2111 | NA | NA | 3rd Qu.:1733 | 3rd Qu.:17.00 | 3rd Qu.:60 | 3rd Qu.:2009 | 3rd Qu.:41.92 | 3rd Qu.:-87.64 | |
| Max. :9181151 | 10/13/01 22:00: 9 | DRIVEWAY - RESIDENTIAL : 1675 | NA | NA | Max. :2535 | Max. :31.00 | Max. :77 | Max. :2012 | Max. :42.02 | Max. :-87.52 | |
| NA | (Other) :191582 | (Other) : 9558 | NA | NA | NA | NA’s :43056 | NA’s :24616 | NA | NA’s :2276 | NA’s :2276 |
If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 15,536 observations fall under the category TRUE for the variable Arrest.
# Output the summary
z = summary(mvt)
kable(z)| ID | Date | LocationDescription | Arrest | Domestic | Beat | District | CommunityArea | Year | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. :1310022 | 5/16/08 0:00 : 11 | STREET :156564 | Mode :logical | Mode :logical | Min. : 111 | Min. : 1.00 | Min. : 0 | Min. :2001 | Min. :41.64 | Min. :-87.93 | |
| 1st Qu.:2832144 | 10/17/01 22:00: 10 | PARKING LOT/GARAGE(NON.RESID.): 14852 | FALSE:176105 | FALSE:191226 | 1st Qu.: 722 | 1st Qu.: 6.00 | 1st Qu.:22 | 1st Qu.:2003 | 1st Qu.:41.77 | 1st Qu.:-87.72 | |
| Median :4762956 | 4/13/04 21:00 : 10 | OTHER : 4573 | TRUE :15536 | TRUE :415 | Median :1121 | Median :10.00 | Median :32 | Median :2006 | Median :41.85 | Median :-87.68 | |
| Mean :4968629 | 9/17/05 22:00 : 10 | ALLEY : 2308 | NA | NA | Mean :1259 | Mean :11.82 | Mean :38 | Mean :2006 | Mean :41.84 | Mean :-87.68 | |
| 3rd Qu.:7201878 | 10/12/01 22:00: 9 | GAS STATION : 2111 | NA | NA | 3rd Qu.:1733 | 3rd Qu.:17.00 | 3rd Qu.:60 | 3rd Qu.:2009 | 3rd Qu.:41.92 | 3rd Qu.:-87.64 | |
| Max. :9181151 | 10/13/01 22:00: 9 | DRIVEWAY - RESIDENTIAL : 1675 | NA | NA | Max. :2535 | Max. :31.00 | Max. :77 | Max. :2012 | Max. :42.02 | Max. :-87.52 | |
| NA | (Other) :191582 | (Other) : 9558 | NA | NA | NA | NA’s :43056 | NA’s :24616 | NA | NA’s :2276 | NA’s :2276 |
# Tabulates the division of observations for the LocationDescription variable
table(mvt$LocationDescription)
##
## ABANDONED BUILDING AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA AIRPORT BUILDING NON-TERMINAL - SECURE AREA AIRPORT EXTERIOR - NON-SECURE AREA
## 4 4 1 24
## AIRPORT EXTERIOR - SECURE AREA AIRPORT PARKING LOT AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA AIRPORT VENDING ESTABLISHMENT
## 1 11 5 10
## AIRPORT/AIRCRAFT ALLEY ANIMAL HOSPITAL APARTMENT
## 363 2308 1 184
## APPLIANCE STORE ATHLETIC CLUB BANK BAR OR TAVERN
## 1 9 7 17
## BARBERSHOP BOWLING ALLEY BRIDGE CAR WASH
## 4 3 2 44
## CHA APARTMENT CHA PARKING LOT/GROUNDS CHURCH/SYNAGOGUE/PLACE OF WORSHIP CLEANING STORE
## 5 405 56 3
## COLLEGE/UNIVERSITY GROUNDS COLLEGE/UNIVERSITY RESIDENCE HALL COMMERCIAL / BUSINESS OFFICE CONSTRUCTION SITE
## 47 2 126 35
## CONVENIENCE STORE CTA GARAGE / OTHER PROPERTY CTA TRAIN CURRENCY EXCHANGE
## 7 148 1 2
## DAY CARE CENTER DEPARTMENT STORE DRIVEWAY - RESIDENTIAL DRUG STORE
## 5 22 1675 8
## FACTORY/MANUFACTURING BUILDING FIRE STATION FOREST PRESERVE GAS STATION
## 16 5 6 2111
## GOVERNMENT BUILDING/PROPERTY GROCERY FOOD STORE HIGHWAY/EXPRESSWAY HOSPITAL BUILDING/GROUNDS
## 48 80 22 101
## HOTEL/MOTEL JAIL / LOCK-UP FACILITY LAKEFRONT/WATERFRONT/RIVERBANK LIBRARY
## 124 1 4 4
## MEDICAL/DENTAL OFFICE MOVIE HOUSE/THEATER NEWSSTAND NURSING HOME/RETIREMENT HOME
## 3 18 1 21
## OTHER OTHER COMMERCIAL TRANSPORTATION OTHER RAILROAD PROP / TRAIN DEPOT PARK PROPERTY
## 4573 8 28 255
## PARKING LOT/GARAGE(NON.RESID.) POLICE FACILITY/VEH PARKING LOT RESIDENCE RESIDENCE-GARAGE
## 14852 266 1302 1176
## RESIDENCE PORCH/HALLWAY RESIDENTIAL YARD (FRONT/BACK) RESTAURANT SAVINGS AND LOAN
## 18 1536 49 4
## SCHOOL, PRIVATE, BUILDING SCHOOL, PRIVATE, GROUNDS SCHOOL, PUBLIC, BUILDING SCHOOL, PUBLIC, GROUNDS
## 14 23 114 206
## SIDEWALK SMALL RETAIL STORE SPORTS ARENA/STADIUM STREET
## 462 33 166 156564
## TAVERN/LIQUOR STORE TAXICAB VACANT LOT/LAND VEHICLE-COMMERCIAL
## 14 21 985 23
## VEHICLE NON-COMMERCIAL WAREHOUSE
## 817 17If you type summary(mvt) in your R console, you can see the summary statistics for each variable. This shows that 2,308 observations fall under the category ALLEY for the variable LocationDescription. You can also read this from table(mvt$LocationDescription).
# Examine the dataset
mvt$Date[1]
## [1] 12/31/12 23:15
## 131680 Levels: 1/1/01 0:01 1/1/01 0:05 1/1/01 0:30 1/1/01 1:17 1/1/01 1:50 1/1/01 10:00 1/1/01 10:12 1/1/01 11:00 1/1/01 12:00 1/1/01 13:00 1/1/01 15:00 1/1/01 15:30 1/1/01 16:00 ... 9/9/12 9:50If you type mvt$Date[1] in your R console, you can see that the first entry is 12/31/12 23:15. This must be in the format Month/Day/Year Hour:Minute.
Now, let’s convert these characters into a Date object in R. In your R console, type
# Convert data into date object
DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.
# Output the summary
summary(DateConvert)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## "2001-01-01" "2003-07-10" "2006-05-21" "2006-08-23" "2009-10-24" "2012-12-31"If you type summary(DateConvert), you can see that the median date is 2006-05-21.
Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:
# Extact the month and day of the week and add them to our date frame
mvt$Month = months(DateConvert)
mvt$Weekday = weekdays(DateConvert)This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:
# Replace the old variable
mvt$Date = DateConvert# Tabulate the amount of motor vehicle thefs in each month
z = table(mvt$Month)
kable(z)| Var1 | Freq |
|---|---|
| April | 15280 |
| August | 16572 |
| December | 16426 |
| February | 13511 |
| January | 16047 |
| July | 16801 |
| June | 16002 |
| March | 15758 |
| May | 16035 |
| November | 16063 |
| October | 17086 |
| September | 16060 |
If you type table(mvt$Month), you can see that the month with the smallest number of observations is February.
# Tabulates the amount of mvt for each weekday
z = table(mvt$Weekday)
kable(z)| Var1 | Freq |
|---|---|
| Friday | 29284 |
| Monday | 27397 |
| Saturday | 27118 |
| Sunday | 26316 |
| Thursday | 27319 |
| Tuesday | 26791 |
| Wednesday | 27416 |
If you type table(mvt$Weekday), you can see that the weekday with the largest number of observations is Friday.
# Tabulates the number of mvts which an arrest was made for each month
z = table(mvt$Arrest,mvt$Month)
kable(z)| April | August | December | February | January | July | June | March | May | November | October | September | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FALSE | 14028 | 15243 | 15029 | 12273 | 14612 | 15477 | 14772 | 14460 | 14848 | 14807 | 15744 | 14812 |
| TRUE | 1252 | 1329 | 1397 | 1238 | 1435 | 1324 | 1230 | 1298 | 1187 | 1256 | 1342 | 1248 |
If you type table(mvt$Arrest,mvt$Month), you can see that the largest number of observations with Arrest=TRUE occurs in the month of January.
First, let’s make a histogram of the variable Date. We’ll add an extra argument, to specify the number of bars we want in our histogram. In your R console, type
# Create a histogram
hist(mvt$Date, breaks=100)While there is not a clear trend, it looks like crime generally decreases.
In this time period, there is a clear downward trend in crime.
In this time period, there is a clear upward trend in crime.
Now, lets see how arrests have changed over time by creating a boxplot
# Create a boxplot
boxplot(mvt$Date ~ mvt$Arrest)If you look at the boxplot, the one for Arrest=TRUE is definitely skewed towards the bottom of the plot, meaning that there were more crimes for which arrests were made in the first half of the time period
# Tabulate the number of mvts which an arrest was made in each year
z = table(mvt$Arrest, mvt$Year)
kable(z)| 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FALSE | 18517 | 16638 | 14859 | 15169 | 14956 | 14796 | 13068 | 13425 | 11327 | 14796 | 15012 | 13542 |
| TRUE | 2152 | 2115 | 1798 | 1693 | 1528 | 1302 | 1212 | 1020 | 840 | 701 | 625 | 550 |
If you create a table using the command table(mvt$Arrest, mvt$Year), the column for 2001 has 2152 observations with Arrest=TRUE and 18517 observations with Arrest=FALSE. The fraction of motor vehicle thefts in 2001 for which an arrest was made is thus 2152/(2152+18517) = 0.1041173.
# Tabulate the amount of mvt in 2007 for which an arrest was made
z = table(mvt$Arrest, mvt$Year)
kable(z)| 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FALSE | 18517 | 16638 | 14859 | 15169 | 14956 | 14796 | 13068 | 13425 | 11327 | 14796 | 15012 | 13542 |
| TRUE | 2152 | 2115 | 1798 | 1693 | 1528 | 1302 | 1212 | 1020 | 840 | 701 | 625 | 550 |
If you create a table using the command table(mvt$Arrest, mvt$Year), the column for 2007 has 1212 observations with Arrest=TRUE and 13068 observations with Arrest=FALSE. The fraction of motor vehicle thefts in 2007 for which an arrest was made is thus 1212/(1212+13068) = 0.08487395.
# Tabulates the number of mvts for which an arrest was made in each year
z = table(mvt$Arrest, mvt$Year)
kable(z)| 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FALSE | 18517 | 16638 | 14859 | 15169 | 14956 | 14796 | 13068 | 13425 | 11327 | 14796 | 15012 | 13542 |
| TRUE | 2152 | 2115 | 1798 | 1693 | 1528 | 1302 | 1212 | 1020 | 840 | 701 | 625 | 550 |
If you create a table using the command table(mvt\(Arrest, mvt\)Year), the column for 2012 has 550 observations with Arrest=TRUE and 13542 observations with Arrest=FALSE. The fraction of motor vehicle thefts in 2012 for which an arrest was made is thus 550/(550+13542) = 0.03902924.
Since there may still be open investigations for recent crimes, this could explain the trend we are seeing in the data. There could also be other factors at play, and this trend should be investigated further. However, since we don’t know when the arrests were actually made, our detective work in this area has reached a dead end.
We want to find the top five locations where motor vehicle thefts occur. If you create a table of the LocationDescription variable, it is unfortunately very hard to read since there are 78 different locations in the data set. By using the sort function, we can view this same table, but sorted by the number of observations in each category. In your R console, type:
# Sorts the tabulated data of the mvts in each location description
z = sort(table(mvt$LocationDescription))
kable(z)| Var1 | Freq |
|---|---|
| AIRPORT BUILDING NON-TERMINAL - SECURE AREA | 1 |
| AIRPORT EXTERIOR - SECURE AREA | 1 |
| ANIMAL HOSPITAL | 1 |
| APPLIANCE STORE | 1 |
| CTA TRAIN | 1 |
| JAIL / LOCK-UP FACILITY | 1 |
| NEWSSTAND | 1 |
| BRIDGE | 2 |
| COLLEGE/UNIVERSITY RESIDENCE HALL | 2 |
| CURRENCY EXCHANGE | 2 |
| BOWLING ALLEY | 3 |
| CLEANING STORE | 3 |
| MEDICAL/DENTAL OFFICE | 3 |
| ABANDONED BUILDING | 4 |
| AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA | 4 |
| BARBERSHOP | 4 |
| LAKEFRONT/WATERFRONT/RIVERBANK | 4 |
| LIBRARY | 4 |
| SAVINGS AND LOAN | 4 |
| AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA | 5 |
| CHA APARTMENT | 5 |
| DAY CARE CENTER | 5 |
| FIRE STATION | 5 |
| FOREST PRESERVE | 6 |
| BANK | 7 |
| CONVENIENCE STORE | 7 |
| DRUG STORE | 8 |
| OTHER COMMERCIAL TRANSPORTATION | 8 |
| ATHLETIC CLUB | 9 |
| AIRPORT VENDING ESTABLISHMENT | 10 |
| AIRPORT PARKING LOT | 11 |
| SCHOOL, PRIVATE, BUILDING | 14 |
| TAVERN/LIQUOR STORE | 14 |
| FACTORY/MANUFACTURING BUILDING | 16 |
| BAR OR TAVERN | 17 |
| WAREHOUSE | 17 |
| MOVIE HOUSE/THEATER | 18 |
| RESIDENCE PORCH/HALLWAY | 18 |
| NURSING HOME/RETIREMENT HOME | 21 |
| TAXICAB | 21 |
| DEPARTMENT STORE | 22 |
| HIGHWAY/EXPRESSWAY | 22 |
| SCHOOL, PRIVATE, GROUNDS | 23 |
| VEHICLE-COMMERCIAL | 23 |
| AIRPORT EXTERIOR - NON-SECURE AREA | 24 |
| OTHER RAILROAD PROP / TRAIN DEPOT | 28 |
| SMALL RETAIL STORE | 33 |
| CONSTRUCTION SITE | 35 |
| CAR WASH | 44 |
| COLLEGE/UNIVERSITY GROUNDS | 47 |
| GOVERNMENT BUILDING/PROPERTY | 48 |
| RESTAURANT | 49 |
| CHURCH/SYNAGOGUE/PLACE OF WORSHIP | 56 |
| GROCERY FOOD STORE | 80 |
| HOSPITAL BUILDING/GROUNDS | 101 |
| SCHOOL, PUBLIC, BUILDING | 114 |
| HOTEL/MOTEL | 124 |
| COMMERCIAL / BUSINESS OFFICE | 126 |
| CTA GARAGE / OTHER PROPERTY | 148 |
| SPORTS ARENA/STADIUM | 166 |
| APARTMENT | 184 |
| SCHOOL, PUBLIC, GROUNDS | 206 |
| PARK PROPERTY | 255 |
| POLICE FACILITY/VEH PARKING LOT | 266 |
| AIRPORT/AIRCRAFT | 363 |
| CHA PARKING LOT/GROUNDS | 405 |
| SIDEWALK | 462 |
| VEHICLE NON-COMMERCIAL | 817 |
| VACANT LOT/LAND | 985 |
| RESIDENCE-GARAGE | 1176 |
| RESIDENCE | 1302 |
| RESIDENTIAL YARD (FRONT/BACK) | 1536 |
| DRIVEWAY - RESIDENTIAL | 1675 |
| GAS STATION | 2111 |
| ALLEY | 2308 |
| OTHER | 4573 |
| PARKING LOT/GARAGE(NON.RESID.) | 14852 |
| STREET | 156564 |
These are Street, Parking Lot/Garage (Non. Resid.), Alley, Gas Station, and Driveway - Residential.
Create a subset of the Top5 locations.
# Create a Top5 subset
Top5 = subset(mvt, LocationDescription=="STREET" | LocationDescription=="PARKING LOT/GARAGE(NON.RESID.)" | LocationDescription=="ALLEY" | LocationDescription=="GAS STATION" | LocationDescription=="DRIVEWAY - RESIDENTIAL")To make our tables a bit nicer to read, we can refresh this factor variable. In your R console, type:
# Create factor variabel
Top5$LocationDescription = factor(Top5$LocationDescription)
# Calculate the proportion
m = table(Top5$LocationDescription, Top5$Arrest)
z = prop.table(m,1)
kable(z)| FALSE | TRUE | |
|---|---|---|
| ALLEY | 0.8921144 | 0.1078856 |
| DRIVEWAY - RESIDENTIAL | 0.9211940 | 0.0788060 |
| GAS STATION | 0.7920417 | 0.2079583 |
| PARKING LOT/GARAGE(NON.RESID.) | 0.8920684 | 0.1079316 |
| STREET | 0.9259408 | 0.0740592 |
Gas Station has by far the highest percentage of arrests, with over 20% of motor vehicle thefts resulting in an arrest.
# Calculate the proportion
m = table(Top5$LocationDescription, Top5$Weekday)
z = prop.table(m,1)
kable(z)| Friday | Monday | Saturday | Sunday | Thursday | Tuesday | Wednesday | |
|---|---|---|---|---|---|---|---|
| ALLEY | 0.1668111 | 0.1386482 | 0.1477470 | 0.1330156 | 0.1364818 | 0.1399480 | 0.1373484 |
| DRIVEWAY - RESIDENTIAL | 0.1534328 | 0.1522388 | 0.1205970 | 0.1319403 | 0.1570149 | 0.1450746 | 0.1397015 |
| GAS STATION | 0.1572714 | 0.1326386 | 0.1601137 | 0.1591663 | 0.1335860 | 0.1279015 | 0.1293226 |
| PARKING LOT/GARAGE(NON.RESID.) | 0.1569486 | 0.1432804 | 0.1480609 | 0.1303528 | 0.1401831 | 0.1395772 | 0.1415971 |
| STREET | 0.1518421 | 0.1424657 | 0.1416354 | 0.1389591 | 0.1424082 | 0.1398023 | 0.1428873 |
Saturday is the day where most motor vehicle thefts at gas stations occur.
# Calculate the proprotion
m = table(Top5$LocationDescription, Top5$Weekday)
z = prop.table(m,1)
kable(z)| Friday | Monday | Saturday | Sunday | Thursday | Tuesday | Wednesday | |
|---|---|---|---|---|---|---|---|
| ALLEY | 0.1668111 | 0.1386482 | 0.1477470 | 0.1330156 | 0.1364818 | 0.1399480 | 0.1373484 |
| DRIVEWAY - RESIDENTIAL | 0.1534328 | 0.1522388 | 0.1205970 | 0.1319403 | 0.1570149 | 0.1450746 | 0.1397015 |
| GAS STATION | 0.1572714 | 0.1326386 | 0.1601137 | 0.1591663 | 0.1335860 | 0.1279015 | 0.1293226 |
| PARKING LOT/GARAGE(NON.RESID.) | 0.1569486 | 0.1432804 | 0.1480609 | 0.1303528 | 0.1401831 | 0.1395772 | 0.1415971 |
| STREET | 0.1518421 | 0.1424657 | 0.1416354 | 0.1389591 | 0.1424082 | 0.1398023 | 0.1428873 |
Saturday is the day where most motor vehicle thefts at residential driveways occur.