This report covers the pre-module assignment 2 of the course Data Mining in R. The goal of this report is to perform an exploratory analysis of the dataset of crime between January and May 2019 provided by the Houston Police Department. This report was created on Sat Jul 20 13:04:14 2019.
First, I accessed the website containing the datasets with the crime statistics for Houston - https://www.houstontx.gov/police/cs/Monthly_Crime_Data_by_Street_and_Police_Beat.htm
Montly statistics were available for the period comprehended between January 2019 and May 2019.
I downloaded the datasets in xlsx (Excel) format and converted them to csv format for ease of import.
The data sets contain information on crime statistics for the following fields: occurence data and hour, crime description, crime count, beat, premise type, block range, street name and suffix. For April and May there is also information available on the zip code where they crime occured.
As part of the data preparation I performed a number of tasks:
## [1] "Aggravated Assault"
## [2] "All other larceny"
## [3] "All other offenses"
## [4] "Animal Cruelty"
## [5] "Arson"
## [6] "Assisting or promoting prostitution"
## [7] "Bad checks"
## [8] "Betting/wagering"
## [9] "Bribery"
## [10] "Burglary, Breaking and Entering"
## [11] "Counterfeiting, forgery"
## [12] "Credit card, ATM fraud"
## [13] "Curfew, loitering, vagrancy violations"
## [14] "Destruction, damage, vandalism"
## [15] "Disorderly conduct"
## [16] "Driving under the influence"
## [17] "Drug equipment violations"
## [18] "Drug, narcotic violations"
## [19] "Drunkenness"
## [20] "Embezzlement"
## [21] "Extortion, Blackmail"
## [22] "False pretenses, swindle"
## [23] "Family offenses, no violence"
## [24] "Forcible fondling"
## [25] "Forcible rape"
## [26] "Forcible sodomy"
## [27] "From coin-operated machine or device"
## [28] "Gambling equipment violations"
## [29] "Hacking/Computer Invasion"
## [30] "Human Trafficking/Commercial Sex Act"
## [31] "Identify theft"
## [32] "Impersonation"
## [33] "Intimidation"
## [34] "Justifiable homicide"
## [35] "Kidnapping, abduction"
## [36] "Liquor law violations"
## [37] "Motor vehicle theft"
## [38] "Murder, non-negligent"
## [39] "Negligent manslaughter"
## [40] "Peeping tom"
## [41] "Pocket-picking"
## [42] "Pornographs, obscene material"
## [43] "Promoting gambling"
## [44] "Prostitution"
## [45] "Purchasing prostitution"
## [46] "Purse-snatching"
## [47] "Robbery"
## [48] "Runaway"
## [49] "Shoplifting"
## [50] "Simple assault"
## [51] "Statutory rape"
## [52] "Stolen property offenses"
## [53] "Theft from building"
## [54] "Theft from motor vehicle"
## [55] "Theft of motor vehicle parts or accessory"
## [56] "Trespass of real property"
## [57] "Weapon law violations"
## [58] "Welfare fraud"
## [59] "Wire fraud"
The table below shows the 10 top and bottom rows of the dataset after performing the aforementioned data preparation and processing tasks. The table doesn’t include all columns and some of them have been omitted. Overall, there are a total of 98842 in the dataset.
Date | nMonth | nWeekDay | Part_of_day | Hour | Description | Beat | Count |
---|---|---|---|---|---|---|---|
2019-03-12 | Mar | Tue | Aft-Ev | 16 | Purchasing prostitution | 19G10 | 10 |
2019-03-15 | Mar | Fri | Aft-Ev | 20 | Intimidation | 7C30 | 10 |
2019-02-02 | Feb | Sat | Night | 00 | Destruction, damage, vandalism | 24C10 | 8 |
2019-02-13 | Feb | Wed | Night | 11 | Purchasing prostitution | 13D10 | 8 |
2019-01-26 | Jan | Sat | Night | 21 | Simple assault | 22B30 | 7 |
2019-03-09 | Mar | Sat | Night | 02 | Theft from motor vehicle | 18F20 | 7 |
2019-05-25 | May | Sat | Night | 23 | Aggravated Assault | 3B50 | 7 |
2019-01-04 | Jan | Fri | Aft-Ev | 16 | Aggravated Assault | 3B40 | 6 |
2019-01-31 | Jan | Thu | Aft-Ev | 16 | Purchasing prostitution | 19G10 | 6 |
2019-02-09 | Feb | Sat | Aft-Ev | 15 | Theft of motor vehicle parts or accessory | 18F50 | 6 |
Date | nMonth | nWeekDay | Part_of_day | Hour | Description | Beat | Count |
---|---|---|---|---|---|---|---|
2019-05-30 | May | Thu | Night | 23 | Theft from motor vehicle | 22B20 | 1 |
2019-05-30 | May | Thu | Night | 23 | Theft from motor vehicle | 2A30 | 1 |
2019-05-30 | May | Thu | Night | 23 | Theft from motor vehicle | 5F30 | 1 |
2019-05-30 | May | Thu | Night | 23 | Theft of motor vehicle parts or accessory | 12D20 | 1 |
2019-05-30 | May | Thu | Night | 23 | Theft of motor vehicle parts or accessory | 5F20 | 1 |
2019-01-11 | Jan | Fri | Aft-Ev | 15 | Motor vehicle theft | 20G60 | 0 |
2019-01-15 | Jan | Tue | Night | 11 | Motor vehicle theft | 20G50 | 0 |
2019-04-19 | Apr | Fri | Aft-Ev | 16 | Motor vehicle theft | 8C50 | 0 |
2019-04-19 | Apr | Fri | Night | 03 | Intimidation | 19G10 | 0 |
2019-04-27 | Apr | Sat | Aft-Ev | 17 | Aggravated Assault | 5F20 | 0 |
The table below shows the summary statistics (i.e. minimum, maximum and interquartile range for the numeric variables and counts for factor variables). We can already start to draw some interesting conclusions.
## Date nMonth nWeekDay Part_of_day
## Min. :2019-01-01 May :21468 Sun:13191 Morning: 0
## 1st Qu.:2019-02-09 Apr :20321 Mon:13584 Aft-Ev :41515
## Median :2019-03-20 Mar :19685 Tue:14105 Night :57327
## Mean :2019-03-18 Jan :19392 Wed:14732
## 3rd Qu.:2019-04-26 Feb :17976 Thu:14504
## Max. :2019-05-30 Jun : 0 Fri:14502
## (Other): 0 Sat:14224
## Hour Description Beat
## 18 : 5964 Theft from motor vehicle :11773 17E10 : 2031
## 12 : 5905 Simple assault :10951 14D20 : 1974
## 17 : 5734 All other offenses : 9068 22B20 : 1892
## 20 : 5503 Destruction, damage, vandalism : 8497 15E40 : 1867
## 19 : 5291 All other larceny : 7317 19G10 : 1835
## 21 : 5245 Burglary, Breaking and Entering: 6362 (Other):89119
## (Other):65200 (Other) :44874 NA's : 124
## Count
## Min. : 0.000
## 1st Qu.: 1.000
## Median : 1.000
## Mean : 1.061
## 3rd Qu.: 1.000
## Max. :10.000
##
Below we can see 4 different graphics that represent the crimes per month, hour of the day, day of the week and part of the day, respectively.
From the interpretation of this graphic, we can confirm some of the conclusions drawn in the previous section.
First, the following table shows the top and bottom 10 counts by type of crime committed in Houston between January and May 2019.
Description | Count |
---|---|
Simple assault | 12910 |
Theft from motor vehicle | 12741 |
All other offenses | 9308 |
Destruction, damage, vandalism | 8690 |
All other larceny | 7459 |
Burglary, Breaking and Entering | 6547 |
Intimidation | 6051 |
Aggravated Assault | 5561 |
Motor vehicle theft | 4983 |
Drug, narcotic violations | 4088 |
Description | Count |
---|---|
Curfew, loitering, vagrancy violations | 23 |
Peeping tom | 13 |
Runaway | 9 |
Promoting gambling | 6 |
Welfare fraud | 5 |
Betting/wagering | 4 |
Bribery | 4 |
Gambling equipment violations | 4 |
Justifiable homicide | 4 |
Negligent manslaughter | 2 |
The following graph shows type of crimes counts by day of the week. Several interesting observations can be drawn:
When looking at the crime types by part of the day we can observe:
Another interesting graph is the distribution of crimes during Friday and Saturday nights.As it can be observed, the top crimes deviate from the norm when observing any day of the week or time of the day, and we see more crimes associated to alcoholism like assault, DUI or vandalism.
First, the following table shows the top and bottom 10 counts of crime by Beat in Houston between January and May 2019. We can see how there is a lot of variations. The top 10, range 2141 to 1869 while the bottom 20 range from 12 to only 1 crime.
Beat | Count |
---|---|
17E10 | 2141 |
14D20 | 2132 |
15E40 | 2035 |
22B20 | 2029 |
19G10 | 2016 |
1A10 | 1904 |
17E40 | 1899 |
12D10 | 1888 |
1A20 | 1885 |
7C20 | 1869 |
Beat | Count |
---|---|
5F40 | 12 |
6B50 | 11 |
21I40 | 10 |
7C50 | 8 |
HCC5 | 6 |
23J40 | 5 |
HCC7 | 4 |
21I30 | 3 |
HCC4 | 2 |
HCC3 | 1 |
The graph below shows the beat where crimes occurred by part of the day. While the proportions are fairly consistent, there are some beats where crimes are much more prominent at night (e.g. 1A20 or 1A30), while in other beats the afternoon/evenning shows comparatively higher crimes. It would be interesting to analyze if some of the former ar more residential areas, while some of the latter are business/working areas but I haven’t dfurther analyzed this.
First, I reviewed the list of crimes and selected the ones that were remarkably harmful based on a purely subjective crtiteria. I considered manslaughter, homicide, murder, rape, kidnapping, sodomy, arson or human trafficking. The goal is to analyze if there is any observable pattern for these types of crimes.
The first table shows how sexually-related crimes (i.e. forcible rape, sodomy and fondling) are the most common. Negligent manslaughter is the least common with 2 occurrences.
Description | Count |
---|---|
Forcible rape | 278 |
Forcible fondling | 214 |
Forcible sodomy | 140 |
Arson | 96 |
Murder, non-negligent | 92 |
Kidnapping, abduction | 75 |
Human Trafficking/Commercial Sex Act | 37 |
Justifiable homicide | 4 |
Negligent manslaughter | 2 |
In order to draw conclusions on an aggregated basis, I analyzed the incidence of the worse crimes (regardless of the description type) by day of the week, month, part of the day and beat.
It is very interesting to see how the worse crimes are pronouncedly higher as the weekend approaches on Fridays and Saturdays, and at night (as opposed to the overall crime dataset, which concentrated more towards afternoon-evenning). Also, April was a pretty bad month in terms of the worse crimes. In terms of beats, it is interesting to see the high variability in the worse crimes from beat to beat.
Next, we analyzed the breakdown of the worse crimes by time of the day, day of the week, month and hour of the day. The following graphs show the results obtained. Several conclusions can be drawn from these graphs e.g. prominence of rape at night during the weekends, forcible fonding on Friday’s during the day, increase in arson and kidnapping in May, etc.