Table of contents

  1. Overview and Objectives
  2. Data Extraction and Transformation
  3. Exploratory Analysis
  4. Seasonality Analysis
  5. Conclusions

1. Overview and Objectives

Houston Police Department generates monthly crime data that includes a monthly breakdown of Group “A” Offenses for which HPD write police reports. The data is broken down by police districts and beats and displayed by street name and block range.

The purpose of this report is to showcase the process to extract, transform and load the monthly crime data generated by the Houston police in order to produce descriptive analytics of the data and understand the main drivers of criminal activity in Houston for the year 2019.

2. Data Extraction and Transformation

2.1 Extraction

Houston monthly crime data is stored in Excel files in the Houston Police web source. Each month corresponds to one Excel file. As a result, the first step in the process was to connect to web source, obtain the data from every single file, and append all the information in one table. The table below displays the framework utilized for the process. Pre-processing algorithms in R were used to connect directly to excel files in the web source and obtain the required information.

Pre - processiong Framework

Pre - processiong Framework

2.2 Transformation

Once the data was obtained, the information was appended in one table to facilitate the aggregation and summarization of the information and deploy visuals for descriptive analytics. The data also was prepared for the analysis. Preparation and transformation included:

• Format the date columns as dates as it was initially in the text format.

• Add new columns such as month, day of the week, day number, and if each day type is a weekend or weekday

• Obtain the main premise identification

The table below displays the columns in the data.

After the transformation, the number of crimes was aggregated by the time of day, weekday, type of crime, premises, streets, and beats to create various visuals.

2.3 Missing Values

Some missing value (represented by NA) are in the data. Depending on the analysis, those NA values were excluded or not. For example, to identify the beats with the highest and lowest number of crimes, the NA in beats were excluded as the crimes recorded in unknown beats represent only 0.1% of all observations. Also, variables such as suffix, block range, and Postcode have a high number of NA; therefore, those variables were not used in the analysis.

3. Exploratory Analysis

Crime data is organized by day, hour, premise, NIBRS description (National Incident-Based Reporting System), Beat, Premise, Street, Block Range. The initial exploratory analysis focuses on obtaining insights and answering questions in regards to which day and hour the crimes occur in Houston. Also, which type of crime and where it occurs. Finally, the analysis looks for identifying trends or seasonality in the events.

3.1 Type of Crimes (Crimes by NIBRS)

There are 59 kinds of crimes that occurred from January 2019 to May 2019. The graph below displays the most frequent type of crimes. The most frequent type of crimes are the ones with more than 500 events during 2019. “Simple assault” and “Theft from a motor vehicle” are the type of crimes with high occurrence during the period (More than 10000 crimes in each category).

The graph 1b displays the most unusual type of crimes. The unusual type of crimes are the ones with less than 50 events during 2019. “Negligent manslaughter”, “Justifiable homicide”, “Gambling equipment violations”, “Bribery” and “Betting/wagering” are the kind of crimes with less occurrence during the period (less than 4 crimes in each category).

3.2 Crimes Location

There are 7291 streets in the data set.

Number_of_Streets
7291

According to the below graphs, the streets with a high number of crimes are Westhmeir, Main and North (Graph 2). Each street has more than 1500 crimes within the 5 months, being Westheimer the one with more than 2500 crimes.

In terms of premises, the dataset contains 47 premises.

Number_of_Premises
47

The prevalent premises where crime occurs are Residence, Highway and Parking lot. However, Residences premises are significant in the graph with 37299 crimes. Highway and Parking lot have 18226 and 17892 crimes respectively.

3.3 Crimes by Beat

There are 127 beats in the dataset.

Number_of_Beats
127

The 5 beats with the highest number of crimes (over 1900) during the period are 17E10, 14D20, 15E40, 22B20, 19G10 and 1A10 (See Graph 4a).

Beats with a lower number of crimes are HCC3, HCC4, 21I30, HCC7, and 23J40. The number of crimes in those beats is less than 5 from the period between January 2019 and May 2019 (See Graph 4b).

The above graphs have described the most prevalent type of crimes and locations where the crimes occurred in Houston. But, do the busiest beats have the same prevalent kind of crimes and locations?

In order to answer this question, the 59 types of crimes were categorized in common crimes and other crimes according to the NIBRS description and graph 2. The common type of crimes for this analysis are the top 7 crimes in graph 2 such as: “Simple assault”, “theft from a motor vehicle”, “Destruction, damage, vandalism”, “all other larceny”, “Burglary, Breaking and Entering” and “Intimidation”. The rest of the crimes were categorized as others.

The graph below (Graph 5a) indicates that crimes such as “Simple assault”, “theft from a motor vehicle”, “Destruction, damage, vandalism”, “all other larceny”, “Burglary, Breaking and Entering” and “Intimidation” represents approximately 50% of the type of crimes in the two beats with more reported events.

In the same way, the type of premises was categorized as common and others, being the common ones “Residence”, “Highway”, “Parking Lot”. The category “other premises” was assigned to the other type of premises.

The graph below (Graph 5b) indicates that premises such as “Residence”, “Highway”, “Parking Lot” are the prevalent location crimes in the two beats with more reported events.

3.4 Crimes by weekday

Surprisingly, the crimes are most frequent during the middle of the week, being Wednesday, Thursday, and Fridays the day with the highest number of incidents. On the other hand, Sunday and Monday are the days with fewer events. ( See Graph 7).

3.5 Crimes by Hour, Time of Day, Beats, Type of Crime and Type of Premises

Graph 8 shows that the frequency of crimes is significantly higher at midnight, noon and between 15:00 hours to 22:00 hours. The crimes least frequently occurred in the early morning, mainly between 5am and 6am when the number of reported incidents is approximately 6000. The crime occurrences then increased steadily through the day as they peaked at noontime and in the evening between 6pm and 7pm. The crime frequency then decreased across night time and sharply declined between 12am and 1am.

This trend is also reviewed in the busiest beats using a heatmap in graph 9a. This graph indicates the frequency of crimes by time of the day in the most hectic beats. In the heatmap below (Graph 9a) it is possible to observe similarities in the occurrence of the events between those busy beats and the days of the week. There is a high frequency of crimes at noon, and between 15:00 hours to 22:00 hours.

Graph 9b displays frequency of crimes by weekday, time and high crime types. Among the high crime categories , “All other larceny”, “All other offenses” and “Destruction, damage and Vandalism” tend to have a uniform frequency of events during the weekdays and hours of days as there is an absence of the color red in the heat map. On the other hand, the type of crime “Theft from motor vehicle” has the highest frequency of events between 18:00 and 20:00 hours from Monday to Thursday. Around midnight and during the morning from 00:00 to 10:00 hours there are no peaks of events on any day of the week.

Simple assault does not have times with frequencies over a 100 but a difference of the “Theft from motor vehicle category” there is a high occurrence of events at midnight on the weekends.

Across the busiest type of premises, the graph 9c clearly showcases that there is a high frequency of crimes in residences from 9:00am to midnight, with a peak of occurrences at noon every day of the week.

4. Seasonality Analysis

The final analysis of this reports is about the trend and season of events during the defined period. For this analysis, the data was organized as a time series of events, which means that all the events were aggregated per day as described in graph 10. The graph shows a linear trend of the events as there is a long-term increase in the data. Also, It is also possible to observe a periodic pattern in the days and a seasonal variation which increases with the level of the series. However, if the times series is divided as per the prevalent crime categories, the seasonality in each category present some differences. For example, graph 11 indicates that the peaks and drops in those categories do not occur in the same season. When the simple assault crimes peaks in a day in Houston the theft from motor vehicle has a significant decrease (See graph below).

In order to easily visualize the peaks in the trend in those crime categories, the data were filtered for the months of April and May. Graph 12a displays that the peaks for the simple assault crimes occur on Saturdays and Sundays, and the drops on Tuesday and Wednesdays. For the case of the crime “Theft from motor vehicle” the peaks are opposite to “Simple Assault” as the drops happen on Saturday and Sunday and the peaks on Wednesday, Thursday and Friday (Graph 12b).

5. Conclusions

• Simple assault and theft from the motor vehicle are the type of crimes with high occurrence during the period;

• The streets with the high number of crimes are Westhmeir, Main and North, being Westheimer the one with more than 2500 crimes;

• The 5 beats with the highest number of crimes (over 1900) during the period are 17E10, 14D20, 15E40, 22B20, 19G10 and 1A10;

• The Beats with a lower number of crimes are HCC3, HCC4, 21I30, HCC7 and 23J40;

• Premises such as “Residence”, “Highway”, “Parking Lot” are the prevalent location crimes in the two beats with more reported events;

• Crimes such as “Simple assault”, “theft from a motor vehicle”, “Destruction, damage, vandalism”, “all other larceny”, “Burglary, Breaking and Entering” and “Intimidation” represents approximately 50% of the type of crimes in the two beats with more reported events;

• Crimes are most frequent during the middle of the week, being Wednesday, Thursday, and Fridays the day with the highest number of incidents;

• The frequency of crimes is significantly higher at midnight, noon and between 15:00 hours to 22:00 hours;

• There is a high frequency of crimes in residences at noon every day of the week; and

• The peaks for the simple assault crimes occur on Saturdays and Sundays, and the drops on Tuesday and Wednesdays. For the case of the crime “Theft from motor vehicle” the peaks are opposite to “Simple Assault” crimes as the drops happen on Saturday and Sunday and the peaks on Wednesday, Thursday and Friday.