Abstract

St. Louis City is thought to be one of the more dangerous major cities in both the Midwest and in the country. Reviewing the public crime records from 2019-2020, I found that there were 47,855 recordable crimes committed within the city of St. Louis. With this number of crimes in mind, and taking into account the estimated population of the city is only 307,866 people, by living in the city you had about a 1 in 6 chance of having a recordable crime committed against you during the 2019 calendar year.1 This statistic is very much a blanket statistic, and because of this, I will do a deeper dive into the numbers over the course of this report.

The goal of this report is to perform data analysis on the crime record database I have created and reveal insights into when and where crime was most common during 2019. Due to the undermanned police force in the city of St. Louis, roughly 140 police officers short as of January 2020, this report could be used as an aide in directing placement and timing of police presence in order to deter and prevent crime.2

Methods

In order to obtain, clean and then analyze the city of STL crime data, I used the scripting language R. This allowed me to pull all the csv files from the St. Louis Metropolitan Police Department website and combine them into a single R dataframe.3 From here I deleted unnecessary columns, used webscraping to pull in neighborhood names and populations and used Regular Expressions to simplify factor classifications for crimes.

I then performed exploratory analysis on the dataframe to determine which crimes stood out the most, the time of their occurrence and the area of their occurrence. This exploration led to the findings that will come later in the report.

A link to the repository where I store the preprocessing script, the exploratory graph script, the final data frame csv file and the Markdown version of this report can be found in the notes section.4

Analysis

The first things I wanted to look at for this data were the breakdown of crimes committed by month:


Crimes committed by day of the week :



and finally crimes committed by hour of the day :


From these plots you can see that generally speaking, July is the most dangerous month of the year, Tuesdays are slightly higher than Fridays for crime and that crime peaks around 5PM. While this does provide some insight into baskets of when general crime occurs, it does not tell us what occurs or where it occurs. To get these insights, we will need to look at crime by neighborhood and crime type.

The next data to key in on is general crime by neighborhood. This will be done by grouping each neighborhood and graphing each neighborhoods’ annual crime count. Due to the high number of areas, the city of St. Louis Police reports include 79 neighborhoods and 9 general city areas, I will only show the top ten neighborhoods which each contribute at least 2% of the annual total crime in St. Louis. A table with full neighborhood rankings and total crime counts can be found in the appendix.

After that I wanted to look at crimes by count. What crimes are occurring the most often? I will show a table with the top ten crimes here. A full table can be found in the appendix.

## # A tibble: 10 x 2
##    Crime_Type            Crime
##    <fct>                 <int>
##  1 Larceny               12542
##  2 All Other Offenses     8702
##  3 Aggravated Assault     3882
##  4 Vandalism              3640
##  5 Motor Vehicle Theft    3364
##  6 Other Assaults         3286
##  7 Burglary               3055
##  8 Robbery                1556
##  9 Drug Abuse Violations  1504
## 10 Weapons Possession     1039

Fig 5 : The top 10 crimes by count in St. Louis City during 2019.


Along with knowing the crime counts, I wanted to see if there were general peak hours for certain crimes. To do this I plotted all 24 general types of crime by hour committed and produced a time series graph which can be found in the appendix. The shapes of most crimes we see follow our original observation that crime seems to peak at 5PM.

From our preliminary tables and graphs, we know what neighborhoods have the most crimes, what crimes are most common and the general timing of crimes by hour, weekday and month of the year. This information, while helpful in understanding general trends, does not fully answer our question of how to deter and prevent crime. To do this we will need to drill down even further by crime type and neighborhood.

The crime type time series graph is difficult to read due to the 24 different types of crime shown. From examining the types of crime, it is clear that not every type of crime can be stopped simply by police presence. In order to make this study more effective, we will remove some of the types of crimes that do not fit into our “deter and prevent” goal.

Crimes that will be excluded going forward include: “All Other Offenses,” “Driving Under the Influence,” “Drug Abuse Violations,” “Embezzlement,” “Forgery and Counterfeiting,” “Fraud,” “Liquor Laws,” “Offenses Against the Family and Children,” “Sex Offenses,” “Stolen Property,” “Vagrancy” and “Weapons Possession.” While “Prostitution” would be a crime that would be deterred by police presence, I am not including it because there were only 3 counts in 2019.

Some of the crimes that are not being investigated above are due to not being easily monitored. One cause for concern from a data integrity standpoint is the exclusion of crimes categorized as “All Other Offenses.” This crime accounted for 8,702 crimes (~18.2% of all crime) and is a large part of the data set. However, the vagueness of the category makes it difficult to infer how to prevent this large set of crimes.

What Crimes will we be looking at then? The crimes we will be examining fall into three categories: theft, destructive acts and violent crimes. Theft includes “Burglary,” “Larceny,” “Motor Vehicle Theft,” and “Robbery.” Destructive acts include “Arson” and “Vandalism.” The final category is violent crimes which includes “Aggravated Assualt,” “Criminal Homicide,” “Forcible Rape,” and “Other Assaults.” These categories will be important in determining the answer to where resources should be placed, and what their placement will prevent.

With a core set of crimes to look at, I next wanted to see how each neighborhood ranks with each crime. This will let us see if certain preventable crimes are centered in the same neighborhood. To do this I sorted the neighborhoods by counts of each of the three categories of preventable crimes. I then plotted the top 10 neighborhoods by count. The results can be seen below :


Figure 6: Top 10 charts broken up by neighborhood for each of the three preventable crime categories


From these graphs we see can see two things. First, the number of crimes scale down from left to right. The 10th highest neighborhood for theft has more total crimes than the top ten combined for violent crimes. Because of this, theft crimes will be weighted higher as they have more opportunities to be prevented.

Second, we see that there are a few neighborhoods that are repeat offenders for types of crime. Downtown, Dutchtown and the Central West End all rank in at least 2 of the 3 top 10’s for our preventable crimes. Dutchtown is especially bad as it is in the top 3 of all three categories, while leading in 2. Due to the high rankings and repeat appearances across categories, these three neighborhoods are perfect locations to focus on preventing and deterring crime.

Along with knowing where we need to have resources to prevent and deter crime, we need to know when these crimes occur. To do this, I created time series graphs for each of the three categories :


Figure 7: Time Series graphs for a 24 hour period for each of the three preventable crime categories.


Having these graphs is very important in determining placement and timing of resources. The general time series plot, and crime counts by hour show that 5PM is the peak of overall crime for St. Louis City. However, when broken into smaller categories we can see that the specific peaks are later. Theft peaks around 7PM, destructive crimes peak around 10PM and violent crimes peak at 11PM. This spread is actually good, as it allows police resources to be rotated to cover multiple areas over multiple time frames.

Error Analysis

There are certain assumptions and limitations inherent in the data collected and the points made above. For procedural error there is in one method that may influence the findings. By removing “All Other Offenses”, I removed roughly 18% of all crimes. This category makes up a large portion of the crimes, but cannot be relied on to be included in a category due to the vague description. While not all of these crimes would have been included in the three preventable categories, there are most likely some crimes that would have been included had they had better descriptions.

Random error comes from the sample size of this data set. This report used almost 48,000 crimes to produce the assumptions of neighborhood crime rankings, prevalence of certain crimes at different times and locations and general crime statistics based off of hours, days, weeks and months. The more granular the statistic, such as those based off hours and days, the more accurate they are due to the laws of large numbers. However, some of the statistics that focus on larger groups, such as which months of the year are most dangerous, are less likely to be accurate as they do not have a large sample size to compare against (multiple years).

Because of this, to truly get an understanding of how time of the year plays into crime, we would have to examine multiple years if not decades. Common sense would agree that the summer months are the most active times for people and crime, but common sense has been wrong before.

Overall, the findings of this report should be accurate as they are not strongly affected by some of the random error discussed in this section.

Looking to future reports, I am curious how current events will affect crime in general. It will be interesting to compare the findings from the 2019 Crime Report Data with that of the 2020 Crime Report Data as the COVID-19 virus is sure to have an impact on STL City crime in 2020.

Conclusion

From our study of the St. Louis City crime reports we now know a number of things. Generally speaking, crime is highest during the summer, tends towards the early and end days of the week and is most frequent right as work is ending. For crimes that can be prevented or deterred, they are most common in Downtown, Dutchtown and the Central West End and occur between 7PM and midnight.

Based off the data, my recommendations would be to have extra police presence in Dutchtown throughout the entire 7PM to midnight block and place extra police presence around Downtown and the Central West End from 7PM-11PM. While this is not a sure way to prevent and deter all crime, it should be a good start.

Notes

  1. “St. Louis City, Missouri Population 2020,” (website), http://worldpopulationreview.com/us-counties/mo/st-louis-city-population/ (accessed February 4, 2020).

  1. “Missouri Lawmakers Likely To End Residency Requirement For St. Louis Police,” Jaclyn Driscoll (website), https://news.stlpublicradio.org/post/missouri-lawmakers-likely-end-residency-requirement-st-louis-police#stream/0 (accessed February 4, 2020).

  1. “SLMPD Downloadable Crime Files”, (website), https://www.slmpd.org/Crimereports.shtml (accessed January 16, 2020).

  1. “STL-Crime-Stats-2019” (Website), https://github.com/dpericich/STL-Crime-Stats-2019 (accessed March 29, 2020).

Appendix

Figure A

Figure B

## # A tibble: 88 x 2
##    NeighName                  Crime
##    <fct>                      <int>
##  1 Downtown                    3477
##  2 Dutchtown                   2557
##  3 Downtown West               2116
##  4 Central West End            2046
##  5 Carondelet                  1718
##  6 Tower Grove South           1414
##  7 Wells/Goodfellow            1256
##  8 Baden                       1240
##  9 Bevo Mill                   1185
## 10 JeffVanderLou               1106
## 11 Gravois Park                1028
## 12 Penrose                      964
## 13 Greater Ville                925
## 14 West End                     854
## 15 Mount Pleasant               771
## 16 O’Fallon                     753
## 17 Mark Twain                   739
## 18 Midtown                      718
## 19 Walnut Park East             679
## 20 Tower Grove East             656
## 21 Grand Center                 643
## 22 Hamilton Heights             629
## 23 Soulard                      585
## 24 Forest Park Southeast        583
## 25 Patch                        581
## 26 Benton Park West             553
## 27 Walnut Park West             546
## 28 Kingsway East                545
## 29 Lindenwood Park              536
## 30 Hyde Park                    529
## 31 Kingsway West                527
## 32 Academy                      509
## 33 North Hampton                504
## 34 Marine Villa                 491
## 35 Peabody Darst Webbe          468
## 36 Columbus Square              465
## 37 Boulevard Heights            456
## 38 The Gate District            453
## 39 Mark Twain/I-70 Industrial   450
## 40 Fairground                   446
## 41 Carr Square                  439
## 42 St. Louis Hills              436
## 43 Near North Riverfront        416
## 44 Southwest Garden             414
## 45 North Point                  413
## 46 South Hampton                412
## 47 St. Louis Place              408
## 48 Old North St. Louis          394
## 49 Princeton Heights            384
## 50 College Hill                 377
## 51 Skinker-DeBaliviere          355
## 52 Benton Park                  349
## 53 Fountain Park                342
## 54 The Hill                     341
## 55 Shaw                         338
## 56 Fox Park                     325
## 57 Vandeventer                  310
## 58 North Riverfront             309
## 59 Holly Hills                  270
## 60 Lewis Place                  261
## 61 The Ville                    260
## 62 DeBaliviere Place            253
## 63 McKinley Heights             237
## 64 Forest Park                  222
## 65 LaSalle Park                 222
## 66 Clayton-Tamm                 213
## 67 Kosciusko                    212
## 68 Clifton Heights              196
## 69 Cheltenham                   192
## 70 Tiffany                      189
## 71 Ellendale                    170
## 72 Lafayette Square             170
## 73 Franz Park                   163
## 74 Hi-Pointe                    161
## 75 Botanical Heights            150
## 76 Visitation Park              123
## 77 Kings Oak                     83
## 78 Compton Heights               82
## 79 Riverview                     77
## 80 Carondelet Park               58
## 81 Cal-Bel Cemetery              43
## 82 Wilmore Park                  41
## 83 Tower Grove Park              40
## 84 Fairgrounds Park              38
## 85 Wydown/Skinker                36
## 86 O'Fallon Park                 35
## 87 Penrose Park                  26
## 88 Botanical Garden               3

Fig B : A table showing all 79 neighborhoods and 9 public areas total crime counts in descending order.

Figure C

## # A tibble: 24 x 2
##    Crime_Type                               Crime
##    <fct>                                    <int>
##  1 Larceny                                  12542
##  2 All Other Offenses                        8702
##  3 Aggravated Assault                        3882
##  4 Vandalism                                 3640
##  5 Motor Vehicle Theft                       3364
##  6 Other Assaults                            3286
##  7 Burglary                                  3055
##  8 Robbery                                   1556
##  9 Drug Abuse Violations                     1504
## 10 Weapons Possession                        1039
## 11 Fraud                                      995
## 12 Vagrancy                                   993
## 13 Disorderly Conduct                         970
## 14 Stolen Property                            723
## 15 Liquor Laws                                257
## 16 Arson                                      235
## 17 Sex Offenses                               225
## 18 Criminal Homicide                          217
## 19 Forcible Rape                              212
## 20 Forgery and Counterfeiting                 180
## 21 Driving Under the Influence                121
## 22 Embezzlement                                88
## 23 Offenses Against the Family and Children    66
## 24 Prosititution                                3

Fig C : A table showing all 24 types of crime in descending order of number of offenses recorded in 2019.

Figure D