Introduction

The second project of this course aimed to reinforce a basic understanding of common statistical measures and analysis applied to geospatial problems. We applied these statistical concepts to analyze point data between the years of 2013 and 2014 that pertain to crime occurrences in St. Louis Missouri. The analysis was done using the statistical analysis software R and accompanying RStudio.

We aimed to achieve three goals throughout the course of this analysis:

1) Discuss the overall findings (i.e. patterns, trends, clustering) of the crime data set

2) Describe the statistical distribution, variance, values, and outliers of the crime data set

3) Create visuals (i.e. plots and maps) to support the overall findings and descriptive statistics of the data set

Through the goals mentioned above this project necessitates the use and practice of both descriptive (exploratory) and inferential (confirmatory) spatial analytic methods (Rogerson, P., 2001)

Methodology

To perform our analysis we utilized the R and Rstudio software to manipulate our data set, create visual graphics, perform descriptive statistics, infer on statistically relevant patterns and trends, and discuss the overall findings of the data. Several R packages were included to achieve the aforementioned including:

The procedural steps used to complete this project referred to the nature of statistical thinking. The first two condsiderations were related to the data; its relevance and how to obtain it (Rogerson, P., 2001). Our initial dataset was provided for us through the course website so obtainment was already fulfilled. In reference to the relevance of our data we utilized the dplyr package to group, filter, and summarize our crime data set into a more useful structure for our analysis (i.e. grouping by crime type, temporal metrics, and crime count). Next we explain the basis for our assumptions. The data set was unknown to the analysts which supported unbiased assumptions. An objective approach of statistical exploration was undertaken to formulate an understanding about the nature of the crime data set. Then, we layed out our arguments which related to crime type patterns over the given time period in the specified spatial location. Lastly, we formulated questions about the attributes of our data set that included descriptive statistical questions such as what are common crime count values, what is the range of crime count values, are there any outliers in the data set, and what does the distribution of our data set look like? Beyond these descriptive measures we sought to answer questions about the different types of crimes in St. Louis and were there temporal patterns both monthly and yearly (Rogerson, P., 2001).

Through this methodology we were able to deduce answers to the above questions through an unbiased vantage point around crime data in St. Louis Missouri between the years of 2013 and 2014.

Results

Descriptive Statistics

We began with the core application of statistics, describing and understanding our crime data set (O’Sullivan, D. & Unwin, D., 2010). What we wished to understand were common measures of central tendency (mean and median) and spread (range, standard deviation, interquartile ranges).

We examined the distribution of the data from multiple scales, both global and local:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   143.0   164.5   186.0   185.7   207.0   228.0

From the above global summary information and corresponding Figure 1 it is fair to infer that the distribution of the data is fairly normal. The mean is approximately equal to the median, the range of values is small, and the quartiles are balanced between the minimum, median, and maximum values. The simliarity of the mean and median do not resemble any significant skew in the overall data set distribution (O’Sullivan, D. & Unwin, D., 2010). We can also infer that the typical value (median or mean) is approximately 186 crimes, the variation in crime count is 85, and there is no obvious evidence of outliers upon the examination of the global data set.

Below, Figure 2, we examine the date on a refined scale which refers to the distributions of individual crime type counts per each month. From the summary statistics and corresponding figure we can see more variance in values of each crime type.Typical values of 20, 17, and 11 correspond arson, dui, and homicide crime types. The range of values is fairly minimal except for the crime type of arson. The arson crime type has a clear outlier with a value of 45 at the maximum end in the month of August. Overall the distributions of the data set related to monthly crime type counts are fairly normal with a slight skew on the arson crime type due to its extreme outlier.

## # A tibble: 3 × 8
##   crimetype  mean median    sd   min    Q1    Q3   max
##   <chr>     <dbl>  <int> <dbl> <int> <dbl> <dbl> <int>
## 1 arson      20.7     20  9.96     9  15.5  21.5    45
## 2 dui        16.9     17  6.27     8  13    20.5    27
## 3 homicide   13       11  6.32     7   8    16      28

Overall Findings

Temporal

The overall findings of the data set revealed some notable patterns and attributes from both a spatial and temporal vantage point. First, the data set documented three crime types between 2013 and 2014 in St. Louis Missouri. Arson was the most frequently occurring crime with 228 documented instances, followed by 186 dui’s, and 143 homicides (Figure 3).

After examining the global crime count per crime type over the entire time period we took a more in depth look at the temporal variability across crime types to detect for any patterns or trends. Below, Figure 4, depicts the total crime count each month between 2013 and 2014. The variability across the months in total crime incidents is minimal with the exception of a sharp spike in August. This spike in total crime in August is largely due to the increased occurrence of arson incidents that was noted early. Upon further research this spike, specifically occurring in 2014, could correlate to the Ferguson unrest, a series of protests and riots in response to a fatal shooting carried out by a police officer (The New York Times, 2014).Spatial analysis is still necessary to support this hypothesis.

Figure 5, below, examines the crime count each month of both years of the data set. Arson has its peak incident rate during September of 2013 followed by a rapid decline through the end of the year. Homicide’s and dui’s maintain a seemingly normal fluctation in pattern throughout 2013 with a notable decline in dui’s as the year ends. Examination of the 2014 monthly crime count displays a different story. dui’s show a sharp rise at the beginning of the year, peaking in April, followed by another rapid decline. Arson incidents have a steep rise at the start of 2014 until March and then hold relatively consistent until the data ends in August 2014. Homicides continue to exhibit a fluctuating pattern throughout. With this yearly examination of monthly crime counts our hypothesis of increasing arson incidents around the time of the Ferguson unrest may not align with the reality of the data.

Spatial

Lastly, we examined any if all underlying spatial patterns and trends within the crimes data set for St. Louis Missouri. To examine spatial patterns we utilized a crime data set that was aggregated by neighborhoods in the hopes of discerning more representative spatial patterns. Additionally, we included a St. Louis city boundary shape file to contain our crime incidents and encourage a more in depth understanding of the spatial distribution of our point features. Figure 6 shows a grid layout of four crime incident maps bounded by the St. Louis city shape file. From the upper left to lower right we have the total crimes, arson incidents, dui incidents, and homicide incidents aggregated by neighborhood. Arson incidents appear to have strong clusters in the North West and South regions of St. Louis. The North West region does correspond to the Ferguson area. Dui’s are pretty evenly disbursed throughout the city and homicides have a linear cluster in the Northern region across an East-West band.

Our last spatial analysis culminated in a synced map view of the 2013 and 2014 crime data in a side by side visual interaction. We felt this was an important addition to the overall analysis that would allow users to more easily explore the aggregated crime data sets across a temporal setting. To achieve this we grouped the crime type subsets by year and utilized a leaflet map with distinguished colored, circle symbology for visually examining the different crimes. Figure 7, below is the result of this analytical procedure.In this figure red equates to arson, blue dui, and green homicide. We can note from this visualization that the variation in homicide and arson resemble similar patterns across the temporal period but dui’s show a clear increase in the Southern region of the city in 2014 compared to 2013. Unfortunately, due to formatting issues the synchronized map would not work in the final report. The screen shot below is a depiction of its capabilities with an attached link in the caption to the html RMarkdown report.

Conclusion

This project highlights the importance of spatial and temporal aggregation in geospatial analysis. Different aggregation approaches (grouping crime incidents by month, year, or crime type) revealed distinct patterns and trends in the data set. While overall crime counts appeared relatively stable, temporal analysis identified notable spikes in certain months, particularly driven by arson incidents. Spatial visualizations further revealed clustering patterns across neighborhoods and differences between crime types.

These results demonstrate how spatial analytical techniques coupled with visualization tools can provide insights into crime distributions and trends. More advanced spatial statistical methods, such as kernel density estimation, spatial autocorrelation measures, or hot spot detection could further examine the understanding of clustering patterns within the city.

Sources

O’Sullivan, D. & Unwin, D. (2010). Appendix A. Geographic Information Analysis (Appendix A). John Wiley & Sons.

Rogerson, P. (2001). Introduction to Statistical Analysis in Geography. Statistical Methods for Geography (Chapter 1). London: Sage Publications.

The New York Times (August 10th, 2015). What Happened in Ferguson?. Retrieved March 7th, 2026, from https://www.nytimes.com/interactive/2014/08/13/us/ferguson-missouri-town-under-siege-after-police-shooting.html/