St. Louis City is thought to be one of the more dangerous major cities in both the Midwest and in the country. Reviewing the public crime records from 2019-2020, I found that there were 47,855 recordable crimes committed within the city of St. Louis. With this number of crimes in mind, and taking into account the estimated population of the city is only 307,866 people, by living in the city you had about a 1 in 6 chance of having a recordable crime committed against you during the 2019 calendar year.1 This statistic is very much a blanket statistic, and because of this, I will do a deeper dive into the numbers over the course of this report.
The goal of this report is to perform data analysis on the crime record database I have created and reveal insights into when and where crime was most common during 2019. Due to the undermanned police force in the city of St. Louis, roughly 140 police officers short as of January 2020, this report could be used as an aide in directing placement and timing of police presence in order to deter and prevent crime.2
In order to obtain, clean and then analyze the city of STL crime data, I used the scripting language R. This allowed me to pull all the csv files from the St. Louis Metropolitan Police Department website and combine them into a single R dataframe.3 From here I deleted unnecessary columns, used webscraping to pull in neighborhood names and populations and used Regular Expressions to simplify factor classifications for crimes.
I then performed exploratory analysis on the dataframe to determine which crimes stood out the most, the time of their occurrence and the area of their occurrence. This exploration led to the findings that will come later in the report.
A link to the repository where I store the preprocessing script, the exploratory graph script, the final data frame csv file and the Markdown version of this report can be found in the notes section.4
The first things I wanted to look at for this data were the breakdown of crimes committed by month:
Crimes committed by day of the week :
and finally crimes committed by hour of the day :
From these plots you can see that generally speaking, July is the most dangerous month of the year, Tuesdays are slightly higher than Fridays for crime and that crime peaks around 5PM. While this does provide some insight into baskets of when general crime occurs, it does not tell us what occurs or where it occurs. To get these insights, we will need to look at crime by neighborhood and crime type.
The next data to key in on is general crime by neighborhood. This will be done by grouping each neighborhood and graphing each neighborhoods’ annual crime count. Due to the high number of areas, the city of St. Louis Police reports include 79 neighborhoods and 9 general city areas, I will only show the top ten neighborhoods which each contribute at least 2% of the annual total crime in St. Louis. A table with full neighborhood rankings and total crime counts can be found in the appendix.
After that I wanted to look at crimes by count. What crimes are occurring the most often? I will show a table with the top ten crimes here. A full table can be found in the appendix.
## # A tibble: 10 x 2
## Crime_Type Crime
## <fct> <int>
## 1 Larceny 12542
## 2 All Other Offenses 8702
## 3 Aggravated Assault 3882
## 4 Vandalism 3640
## 5 Motor Vehicle Theft 3364
## 6 Other Assaults 3286
## 7 Burglary 3055
## 8 Robbery 1556
## 9 Drug Abuse Violations 1504
## 10 Weapons Possession 1039
Fig 5 : The top 10 crimes by count in St. Louis City during 2019.
Along with knowing the crime counts, I wanted to see if there were general peak hours for certain crimes. To do this I plotted all 24 general types of crime by hour committed and produced a time series graph which can be found in the appendix. The shapes of most crimes we see follow our original observation that crime seems to peak at 5PM.
From our preliminary tables and graphs, we know what neighborhoods have the most crimes, what crimes are most common and the general timing of crimes by hour, weekday and month of the year. This information, while helpful in understanding general trends, does not fully answer our question of how to deter and prevent crime. To do this we will need to drill down even further by crime type and neighborhood.
The crime type time series graph is difficult to read due to the 24 different types of crime shown. From examining the types of crime, it is clear that not every type of crime can be stopped simply by police presence. In order to make this study more effective, we will remove some of the types of crimes that do not fit into our “deter and prevent” goal.
Crimes that will be excluded going forward include: “All Other Offenses,” “Driving Under the Influence,” “Drug Abuse Violations,” “Embezzlement,” “Forgery and Counterfeiting,” “Fraud,” “Liquor Laws,” “Offenses Against the Family and Children,” “Sex Offenses,” “Stolen Property,” “Vagrancy” and “Weapons Possession.” While “Prostitution” would be a crime that would be deterred by police presence, I am not including it because there were only 3 counts in 2019.
Some of the crimes that are not being investigated above are due to not being easily monitored. One cause for concern from a data integrity standpoint is the exclusion of crimes categorized as “All Other Offenses.” This crime accounted for 8,702 crimes (~18.2% of all crime) and is a large part of the data set. However, the vagueness of the category makes it difficult to infer how to prevent this large set of crimes.
What Crimes will we be looking at then? The crimes we will be examining fall into three categories: theft, destructive acts and violent crimes. Theft includes “Burglary,” “Larceny,” “Motor Vehicle Theft,” and “Robbery.” Destructive acts include “Arson” and “Vandalism.” The final category is violent crimes which includes “Aggravated Assualt,” “Criminal Homicide,” “Forcible Rape,” and “Other Assaults.” These categories will be important in determining the answer to where resources should be placed, and what their placement will prevent.
With a core set of crimes to look at, I next wanted to see how each neighborhood ranks with each crime. This will let us see if certain preventable crimes are centered in the same neighborhood. To do this I sorted the neighborhoods by counts of each of the three categories of preventable crimes. I then plotted the top 10 neighborhoods by count. The results can be seen below :
Figure 6: Top 10 charts broken up by neighborhood for each of the three preventable crime categories
From these graphs we see can see two things. First, the number of crimes scale down from left to right. The 10th highest neighborhood for theft has more total crimes than the top ten combined for violent crimes. Because of this, theft crimes will be weighted higher as they have more opportunities to be prevented.
Second, we see that there are a few neighborhoods that are repeat offenders for types of crime. Downtown, Dutchtown and the Central West End all rank in at least 2 of the 3 top 10’s for our preventable crimes. Dutchtown is especially bad as it is in the top 3 of all three categories, while leading in 2. Due to the high rankings and repeat appearances across categories, these three neighborhoods are perfect locations to focus on preventing and deterring crime.
Along with knowing where we need to have resources to prevent and deter crime, we need to know when these crimes occur. To do this, I created time series graphs for each of the three categories :
Figure 7: Time Series graphs for a 24 hour period for each of the three preventable crime categories.
Having these graphs is very important in determining placement and timing of resources. The general time series plot, and crime counts by hour show that 5PM is the peak of overall crime for St. Louis City. However, when broken into smaller categories we can see that the specific peaks are later. Theft peaks around 7PM, destructive crimes peak around 10PM and violent crimes peak at 11PM. This spread is actually good, as it allows police resources to be rotated to cover multiple areas over multiple time frames.
There are certain assumptions and limitations inherent in the data collected and the points made above. For procedural error there is in one method that may influence the findings. By removing “All Other Offenses”, I removed roughly 18% of all crimes. This category makes up a large portion of the crimes, but cannot be relied on to be included in a category due to the vague description. While not all of these crimes would have been included in the three preventable categories, there are most likely some crimes that would have been included had they had better descriptions.
Random error comes from the sample size of this data set. This report used almost 48,000 crimes to produce the assumptions of neighborhood crime rankings, prevalence of certain crimes at different times and locations and general crime statistics based off of hours, days, weeks and months. The more granular the statistic, such as those based off hours and days, the more accurate they are due to the laws of large numbers. However, some of the statistics that focus on larger groups, such as which months of the year are most dangerous, are less likely to be accurate as they do not have a large sample size to compare against (multiple years).
Because of this, to truly get an understanding of how time of the year plays into crime, we would have to examine multiple years if not decades. Common sense would agree that the summer months are the most active times for people and crime, but common sense has been wrong before.
Overall, the findings of this report should be accurate as they are not strongly affected by some of the random error discussed in this section.
Looking to future reports, I am curious how current events will affect crime in general. It will be interesting to compare the findings from the 2019 Crime Report Data with that of the 2020 Crime Report Data as the COVID-19 virus is sure to have an impact on STL City crime in 2020.
From our study of the St. Louis City crime reports we now know a number of things. Generally speaking, crime is highest during the summer, tends towards the early and end days of the week and is most frequent right as work is ending. For crimes that can be prevented or deterred, they are most common in Downtown, Dutchtown and the Central West End and occur between 7PM and midnight.
Based off the data, my recommendations would be to have extra police presence in Dutchtown throughout the entire 7PM to midnight block and place extra police presence around Downtown and the Central West End from 7PM-11PM. While this is not a sure way to prevent and deter all crime, it should be a good start.
## # A tibble: 88 x 2
## NeighName Crime
## <fct> <int>
## 1 Downtown 3477
## 2 Dutchtown 2557
## 3 Downtown West 2116
## 4 Central West End 2046
## 5 Carondelet 1718
## 6 Tower Grove South 1414
## 7 Wells/Goodfellow 1256
## 8 Baden 1240
## 9 Bevo Mill 1185
## 10 JeffVanderLou 1106
## 11 Gravois Park 1028
## 12 Penrose 964
## 13 Greater Ville 925
## 14 West End 854
## 15 Mount Pleasant 771
## 16 O’Fallon 753
## 17 Mark Twain 739
## 18 Midtown 718
## 19 Walnut Park East 679
## 20 Tower Grove East 656
## 21 Grand Center 643
## 22 Hamilton Heights 629
## 23 Soulard 585
## 24 Forest Park Southeast 583
## 25 Patch 581
## 26 Benton Park West 553
## 27 Walnut Park West 546
## 28 Kingsway East 545
## 29 Lindenwood Park 536
## 30 Hyde Park 529
## 31 Kingsway West 527
## 32 Academy 509
## 33 North Hampton 504
## 34 Marine Villa 491
## 35 Peabody Darst Webbe 468
## 36 Columbus Square 465
## 37 Boulevard Heights 456
## 38 The Gate District 453
## 39 Mark Twain/I-70 Industrial 450
## 40 Fairground 446
## 41 Carr Square 439
## 42 St. Louis Hills 436
## 43 Near North Riverfront 416
## 44 Southwest Garden 414
## 45 North Point 413
## 46 South Hampton 412
## 47 St. Louis Place 408
## 48 Old North St. Louis 394
## 49 Princeton Heights 384
## 50 College Hill 377
## 51 Skinker-DeBaliviere 355
## 52 Benton Park 349
## 53 Fountain Park 342
## 54 The Hill 341
## 55 Shaw 338
## 56 Fox Park 325
## 57 Vandeventer 310
## 58 North Riverfront 309
## 59 Holly Hills 270
## 60 Lewis Place 261
## 61 The Ville 260
## 62 DeBaliviere Place 253
## 63 McKinley Heights 237
## 64 Forest Park 222
## 65 LaSalle Park 222
## 66 Clayton-Tamm 213
## 67 Kosciusko 212
## 68 Clifton Heights 196
## 69 Cheltenham 192
## 70 Tiffany 189
## 71 Ellendale 170
## 72 Lafayette Square 170
## 73 Franz Park 163
## 74 Hi-Pointe 161
## 75 Botanical Heights 150
## 76 Visitation Park 123
## 77 Kings Oak 83
## 78 Compton Heights 82
## 79 Riverview 77
## 80 Carondelet Park 58
## 81 Cal-Bel Cemetery 43
## 82 Wilmore Park 41
## 83 Tower Grove Park 40
## 84 Fairgrounds Park 38
## 85 Wydown/Skinker 36
## 86 O'Fallon Park 35
## 87 Penrose Park 26
## 88 Botanical Garden 3
Fig B : A table showing all 79 neighborhoods and 9 public areas total crime counts in descending order.
## # A tibble: 24 x 2
## Crime_Type Crime
## <fct> <int>
## 1 Larceny 12542
## 2 All Other Offenses 8702
## 3 Aggravated Assault 3882
## 4 Vandalism 3640
## 5 Motor Vehicle Theft 3364
## 6 Other Assaults 3286
## 7 Burglary 3055
## 8 Robbery 1556
## 9 Drug Abuse Violations 1504
## 10 Weapons Possession 1039
## 11 Fraud 995
## 12 Vagrancy 993
## 13 Disorderly Conduct 970
## 14 Stolen Property 723
## 15 Liquor Laws 257
## 16 Arson 235
## 17 Sex Offenses 225
## 18 Criminal Homicide 217
## 19 Forcible Rape 212
## 20 Forgery and Counterfeiting 180
## 21 Driving Under the Influence 121
## 22 Embezzlement 88
## 23 Offenses Against the Family and Children 66
## 24 Prosititution 3
Fig C : A table showing all 24 types of crime in descending order of number of offenses recorded in 2019.