Geographic phenomena are often observed and documented, but in order to find insights in those data, various methods need to be applied. Geographic Information Analysis (GIA) is the application of spatial, statistical, and visualization methods that help to discover those insights. Spatial and statistical analysis are deeply interwoven but attempt to answer different questions. Spatial analysis focuses on spatial relationships like clustering, adjacency, topology, and distance to tell you where something is happening and why its happening there. These analyses are visualized through maps and color. Statistical analysis generally tells you what is happening and is visualized through charts and graphs.
In this project we will use an exploratory method of analysis along with descriptive analysis to look at crime rate data in St Louis from 2013 - 2014. This is a flexible process that should allow us to uncover insights and suggest further hypothesis. We will look at two sets of crime data, one aggregated and one not with relation to the city of St Louis and its neighborhoods and police districts.
The data is stored in two folders that we will need to access.
With the working directory set we can look at what files we will be using, seen below. Two .csv files, as noted above, as well as three shape files that appear to represent the St Louis Boundary (stl_boundary), St Louis Neighborhoods (nbrhds_wards), as well as the St Louis Police Districts (STL-Police-Districts-2014-2).
## [1] "crimeStLouis20132014b.csv" "crimeStLouis20132014b_agg.csv"
## [1] "__MACOSX" "nbrhds_wards"
## [3] "STL-Police-Districts-2014-2" "STL Police Districts - pre-2014"
## [5] "stl_boundary" "stl_boundary.dbf"
## [7] "stl_boundary.prj" "stl_boundary.sbn"
## [9] "stl_boundary.sbx" "stl_boundary.shp"
## [11] "stl_boundary.shp.xml" "stl_boundary.shx"
## [13] "stl_boundary_ll.cpg" "stl_boundary_ll.dbf"
## [15] "stl_boundary_ll.prj" "stl_boundary_ll.sbn"
## [17] "stl_boundary_ll.sbx" "stl_boundary_ll.shp"
## [19] "stl_boundary_ll.shp.xml" "stl_boundary_ll.shx"
Opening crimeStLouis20132014b.csv and looking at the first 5 rows of crime data (Figure 3) shows us what we are able to work with throughout this analysis.
There is a spatial component to each instance of crime found in ‘xL’ and ‘yL’, assumed to be longitude and latitude, as well as ‘district’ and ‘Neighborho’. These two will tie in well with the shapefiles (‘nbrhds_wards’, ‘STL-Police-Districts-2014-2’) we have to work with. There is also a temporal element associated with each of the crimes with the inclusion of ‘month’, ‘year’ and ‘codemonth’ data.
Some of the data presented will be hard to analyse without further information (‘recno’, ‘crimet’) because the meaning cannot be inferred without prior knowledge.
## recno crimetype xL yL year month count codemonth crimet district
## 1 45419 homicide -90.29043 38.64133 2013 9 1 2013-09 10000 2
## 2 45417 homicide -90.29043 38.64133 2013 9 1 2013-09 10000 2
## 3 5825 homicide -90.28759 38.65942 2014 7 1 2014-07 10000 5
## 4 5826 homicide -90.28759 38.65942 2014 7 1 2014-07 10000 5
## 5 8592 homicide -90.28375 38.67713 2014 7 1 2014-07 10000 5
## Neighborho
## 1 82
## 2 82
## 3 48
## 4 48
## 5 50
Creating lists from the variables in the data allows us to see that range of data in each category. Seen below, there are 3 crime types occuring in 11 months over 2 years throughout and 9 districts and 77 neighborhoods.
## [1] "homicide" "arson" "dui"
## [1] 9 7 2 3 11 5 10 12 8 4 6
## [1] 2013 2014
## [1] 2 5 7 1 6 8 9 3 4 0
length(listneigh)
## [1] 77
Summarizing the data further shows us that there were 557 instances of crime in St Louis from 2013-2014. No help is given in further understanding ‘recno’ or ‘crimet’. the precision of the ‘recno’ values makes me believe these are unique identifier values for the crime occurances. The summarized values of ‘crimet’ lead me to believe these could be official codes used to classify types of crime.
## recno crimetype xL yL
## Min. : 235 Length:557 Min. :-90.33 Min. :38.54
## 1st Qu.:14284 Class :character 1st Qu.:-90.26 1st Qu.:38.60
## Median :24229 Mode :character Median :-90.24 Median :38.65
## Mean :26208 Mean :-90.24 Mean :38.64
## 3rd Qu.:39623 3rd Qu.:-90.22 3rd Qu.:38.67
## Max. :52145 Max. :-90.18 Max. :38.76
## year month count codemonth
## Min. :2013 Min. : 2.000 Min. :1 Length:557
## 1st Qu.:2013 1st Qu.: 4.000 1st Qu.:1 Class :character
## Median :2014 Median : 8.000 Median :1 Mode :character
## Mean :2014 Mean : 6.996 Mean :1
## 3rd Qu.:2014 3rd Qu.: 9.000 3rd Qu.:1
## Max. :2014 Max. :12.000 Max. :1
## crimet district Neighborho
## Min. : 10000 Min. :0.000 Min. : 0.00
## 1st Qu.: 10000 1st Qu.:2.000 1st Qu.:18.00
## Median : 83000 Median :4.000 Median :50.00
## Mean :106830 Mean :4.129 Mean :42.53
## 3rd Qu.:211000 3rd Qu.:6.000 3rd Qu.:64.00
## Max. :212000 Max. :9.000 Max. :83.00
Looking at the number of crimes by crime type (figure 5) allows us to see the different crime types that were recorded as well as how many occurred. According to the figure below, arson was the most common crime across these two years with 228.Homicide was the least common with 143.Adding an additional variable police districts (figure 6), show us that districts 1-6 have significantly more occurrences of crime than the 7-9. A more detailed view of districts can be seen in Figure 11. It is interesting to note that in 2014 there were only 6 police districts in St Louis, but our in the crime data, there are occurrences for districts 0-9, meaning there was likely a redistricting in at the end of 2013 that caused this discrepancy in the data. It also shows an outlier for the data, with an occurrence of a dui taking place in the nonexistent district ‘zero’. More detailed analysis should remove this value.
A closer analysis of the number of crimes at different temporal levels yielded interesting results. Figure 7 shows the number of crime over each month with recorded occurances. August has the most crime over the two observed years with 99 out of the 557 occurrences taking place in that month. Looking at figure 9 explains why this is, August is the only month of recorded data that was collected twice leading to the misrepresentation in figure 7.
There also appears to be a seasonal trends to the crime rates, with more crimes occurring in the spring and late summer. Figure 8 shows an increase in crime from 2013 to 2014. Once again, when looking at figure 9, this is likely because 2013 only has 4 months worth of observations, while 2014 has 7 months.
When the data are mapped within the city limits (figure 10) the distribution of crimes seems evenly spread. Crimes are less likely to occur in parks and most often take place in residential areas.
The sptial breakdown of crime types seen in figure 12 shows clear hotspots in districts 6 (111 occurances) and 5(87). These areas are more residential where it would make sense with the types of crimes that were commited here being of a personal nature. DUIs were spatially spread out over the study area, likely due to the people being arrested for those crime being on the move at the time of arrest.
By using exploratory and descriptive methods to analyse the crime rate data in St Louis, we were able to observe spatial trends that could lead to more inferential analysis in the future. There are some places where the data needs to be adapted so that a stationary model can be established, namely the removal of points outside of the study area and adjustments to the police districts being updated to the most recent zones instead of a mix between two layouts. This is a MAUP issue and should be resolved.
Care should be taken when drawing conclusions based on the temporal nature of the crimes at the yearly and monthly level because the data is not equally spread among those time frames with only 4 months of 2013 and 7 months from 2014. Interestingly, there is no recorded data from the month of January. This is unlikely to be the case when every other month has had at least m33 crimes recorded.
There was one outlier in the data, a crime that was documented outside the study area. In further analysis, this point should be removed.
By stratifying the data in figure 12, we are able to see the more prominent spatial trends, namely the dispersion of DUI across the city versus the more localized arson/homicide occurrences.
Rogerson, P. (2001). Statistical Methods for Geography. SAGE Publications, Ltd. https://doi.org/10.4135/9781849209953