Another data set I wanted to investigate from Madison’s Open Data Portal is focuses on police data. I have a lot of interest in crime data which originates from my childhood. I was born in Indianapolis and one of my memories from our house was that we had a security alarm. Every time we came home the alarm would sound and my parents would have to enter the code to turn it off. My parents installed the alarm after some teenagers broke into our house and stole some items including a videorecorder containing videos of me as a baby. Sadly, I cannot view a part of my childhood but on the plus side my parent’s lost some of their ammunition for embarrassing the teenage version of me. The news in Indianapolis commonly contained reports of theft, and homicide. This is not to paint Indianapolis as a dangerous place but show the contrast to Marquette, MI where we moved when I was 6 and I lived until I was 18 years old. Marquette is a very beautiful town in Michigan’s Upper Peninsula of Michigan on Lake Superior. It’s the type of town where people rarely lock their cars. One of our past times is looking over the police log for funny reports. This is an example from a log my friend recently posted “Mining Journal Police Log: 10:03 a.m., reports of a suspicious sliver and shiny substance falling from the sky. Caller reported that it looked like a snowstorm but knew it was not snowing, worried that it was a hazardous substance. Caller realized before officers arrived that it was, in fact, snow after it melted in her hand.” There are crimes and I’ve had my bike stolen from a bike rack (I didn’t have a lock), I’ve had a friend who had his bike stolen from his porch (he did have a lock), and know of cases where someone tried to rob tools from someone’s house at night, but overall it’s a very safe community. One of my general beliefs is that a community needs trust to grow, and crime corrodes trust. There are many questions I have about crime and police but for this data so I’m going to spend a few posts focusing on general exploration of the data and maybe get into some forecasting.
source("policeAssessment.R")
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(forecast)
## Loading required package: timeDate
## This is forecast 7.0
dfCalls <- read.csv("callsForService.csv")
dfCalls <- updateTimes(dfCalls)
dfPol <- read.csv("policeReports.csv")
dfPol$Date.Time <- dfPol$Incident.Date
dfPol <- updateTimes(dfPol)
dim(dfCalls)
## [1] 581377 10
dim(dfPol)
## [1] 9202 17
Madison has four seasons with distinct changes and it seems reasonable to assume that there are fewer calls to police during the winter. This could be due to many factors but I would guess it’s largely due to people staying indoors more. Looking at the number of calls, we can see a very consistent drop in the winter months and a sharp rise in the summer months which supports that hypothesis.
##
## 12-01 12-02 12-03 12-04 12-05 12-06 12-07 12-08 12-09 12-10 12-11 12-12
## 9629 9614 11797 10758 13352 13118 13361 13752 12485 11635 9989 10087
## 13-01 13-02 13-03 13-04 13-05 13-06 13-07 13-08 13-09 13-10 13-11 13-12
## 8498 9462 10326 11597 12358 12784 12865 13413 12248 12284 10034 9515
## 14-01 14-02 14-03 14-04 14-05 14-06 14-07 14-08 14-09 14-10 14-11 14-12
## 9531 9255 10242 11176 12225 12159 12791 13229 12539 11662 9854 9570
## 15-01 15-02 15-03 15-04 15-05 15-06 15-07 15-08 15-09 15-10 15-11 15-12
## 9732 9160 10405 11364 11962 12706 13440 12707 12657 12213 10384 10561
## 16-01 16-02
## 10437 9605
We can still see this pattern when summarizing on a daily level.
callDaySummary <- table(dfCalls[dfCalls$Date.Time<"2016-03-01 00:00:00",]$Date) # start with data from Feb 2016 as we're missing more recent data)
callDaySummary <- as.data.frame(callDaySummary)
names(callDaySummary) <- c("Date","Freq")
callDaySummary$Date <- as.Date(callDaySummary$Date)
callDateSummaryXTS <- as.xts(callDaySummary$Freq,order.by=callDaySummary$Date)
dygraph(callDateSummaryXTS,ylab="Police Call Logs",xlab="Date")
We also have the data set for police incident reports. This data doesn’t include all incident events but does contain a bit more detail than the call logs. Let’s see if we can find the same seasonal trends.
polSummary <- table(dfPol[dfPol$Date.Time<"2015-11-01 00:00:00"&dfPol$Date.Time>="2009-01-01 00:00:00",]$YearMonth)
print(polSummary)
##
## 09-01 09-02 09-03 09-04 09-05 09-06 09-07 09-08 09-09 09-10 09-11 09-12
## 86 81 93 74 110 86 102 91 111 88 90 76
## 10-01 10-02 10-03 10-04 10-05 10-06 10-07 10-08 10-09 10-10 10-11 10-12
## 92 61 93 94 113 83 113 97 80 90 95 74
## 11-01 11-02 11-03 11-04 11-05 11-06 11-07 11-08 11-09 11-10 11-11 11-12
## 66 60 70 82 103 105 113 109 84 117 90 109
## 12-01 12-02 12-03 12-04 12-05 12-06 12-07 12-08 12-09 12-10 12-11 12-12
## 93 81 89 101 102 89 111 87 105 97 86 67
## 13-01 13-02 13-03 13-04 13-05 13-06 13-07 13-08 13-09 13-10 13-11 13-12
## 78 77 89 87 80 73 101 108 103 93 93 69
## 14-01 14-02 14-03 14-04 14-05 14-06 14-07 14-08 14-09 14-10 14-11 14-12
## 79 37 64 70 79 83 69 65 86 88 73 78
## 15-01 15-02 15-03 15-04 15-05 15-06 15-07 15-08 15-09 15-10
## 61 59 54 76 92 88 52 68 83 78
polSummary <- as.data.frame(polSummary)
names(polSummary) <- c("Date","Freq")
polSummary$Date <- as.Date(paste(polSummary$Date,"-01",sep=""),"%y-%m-%d")
polSummaryXTS <- as.xts(polSummary$Freq,order.by=polSummary$Date)
dygraph(polSummaryXTS,ylab="Police Incident Reports",xlab="Month")
Well that’s not nearly as clean as the call logs. I was expecting to see a trend far more similar to the number of police calls with regards to seasonality. We can still make some generalizations like January and February seem to have fewer police reports than July but that trend isn’t nearly as distinct as the one we saw with the call reports. This is an area where I would want to investigate the provenance of the data a bit more. Maybe these reports are only for more severe crimes or cases where police identified a victim or individual arrested. We also know that this is only a subset of the total number of incidents. The data may be limited to the reports police have the resources to document, or maybe this is the actual trend. If I were to do significantly more work with this these are some of the questions I would want to explore a bit more.
Let’s go into a few more specific crimes. I used to think if Michael Scott from the Office when I thought of fraud and scams. However, I recently listened to a Planet Money podcast about scams and realized how negatively these scams can affect people. Looking over the police reports you can get a better idea of real people being affected by these scams. I was curious how these occur over time.
polSummaryFraud <- dfPol[(dfPol$Date.Time<"2015-11-01 00:00:00"&dfPol$Date.Time>="2009-01-01 00:00:00" & dfPol$Incident.Type=="Fraud"),]
fraudSummary <- table(polSummaryFraud[polSummaryFraud$Date.Time<"2015-11-01 00:00:00"&polSummaryFraud$Date.Time>="2009-01-01 00:00:00",]$YearMonth)
print(fraudSummary)
##
## 09-01 09-02 09-03 09-04 09-05 09-06 09-07 09-08 09-09 09-10 09-11 09-12
## 5 7 4 4 1 1 1 4 4 1 5 3
## 10-01 10-02 10-03 10-04 10-05 10-06 10-07 10-08 10-09 10-10 10-11 10-12
## 6 6 6 6 5 2 4 3 4 5 4 4
## 11-02 11-03 11-04 11-05 11-06 11-07 11-08 11-09 11-10 11-11 11-12 12-01
## 1 2 5 6 4 2 7 3 4 3 11 6
## 12-02 12-03 12-04 12-05 12-06 12-07 12-08 12-09 12-10 12-11 12-12 13-01
## 3 1 4 1 3 7 4 2 3 4 2 6
## 13-02 13-03 13-04 13-05 13-06 13-07 13-08 13-09 13-10 13-11 13-12 14-01
## 4 3 4 3 2 6 3 3 3 5 2 3
## 14-02 14-03 14-04 14-05 14-06 14-07 14-08 14-09 14-10 14-11 14-12 15-01
## 2 8 5 4 1 1 4 4 3 6 2 2
## 15-03 15-04 15-05 15-06 15-08 15-09 15-10
## 1 2 2 3 2 6 4
fraudSummary <- as.data.frame(fraudSummary)
names(fraudSummary) <- c("Date","Freq")
fraudSummary$Date <- as.Date(paste(fraudSummary$Date,"-01",sep=""),"%y-%m-%d")
fraudXTS <- as.xts(fraudSummary$Freq,order.by=fraudSummary$Date)
dygraph(fraudXTS,ylab="Fraud Reports",xlab="Month")
As you see there aren’t that many fraud cases reported in our data and I would imagine this is a crime that commonly goes unreported. There doesn’t appear to be as obvious of a pattern of fraud data on winter vs summer months and this may be a better data set to analyze the text of the call reports instead of the actual incident numbers. Reading over some of the text I did appreciate that many of the callers mentioned reporting their case to the police in hopes of spreading awareness of the scam so that other people would fall victim to the scammers. One in particular involved a grandmother paying $9,500 for what she believed was the bail of her grandson. Sometimes qualitative data can paint a better picture of the data and I think this is one of those cases. You might have noticed a large spike of 11 incidents reported on December 2011. My first thought was that the case may involve the holidays and after looking more I saw two cases involved counterfeit money and the other nine were mostly focused on phone or Craigslist scams.
For our last break down let’s look at the homicide and attempted homicide cases in Madison. One thing that sticks out is there really aren’t too many homicides in Madison which I expected. We can see that there are zero reported cases for the month of January so the antidotal view that there are more homicides in summer than winter seems to hold some weight, but the more meaningful data is looking over the police reports and the individual cases to understand the circumstances surrounding the homicides and attempted homicides.
polSummaryHomicide <- dfPol[dfPol$Date.Time<"2015-11-01 00:00:00"&dfPol$Date.Time>="2009-01-01 00:00:00" & ( dfPol$Incident.Type=="Murder/Homicide"| dfPol$Incident.Type=="Attempted Homicide"),]
homicideSummary <- table(polSummaryHomicide[polSummaryHomicide$Date.Time<"2015-11-01 00:00:00"&polSummaryHomicide$Date.Time>="2009-01-01 00:00:00",]$YearMonth)
print(homicideSummary)
##
## 09-03 09-05 09-06 09-07 09-09 09-11 09-12 10-03 10-04 10-11 11-03 11-05
## 1 1 3 1 2 1 1 1 1 1 1 1
## 11-06 11-07 11-08 11-09 11-10 12-01 12-02 12-05 12-06 12-07 12-09 12-10
## 1 2 2 1 1 1 1 2 3 1 1 1
## 13-01 13-03 13-04 13-06 13-09 13-10 13-11 14-03 14-05 14-07 14-08 14-09
## 1 1 1 2 1 2 1 1 1 2 1 2
## 14-12 15-01 15-02 15-04 15-07 15-08
## 1 1 1 1 1 1
homicideSummary <- as.data.frame(homicideSummary)
names(homicideSummary) <- c("Date","Freq")
homicideSummary$Date <- as.Date(paste(homicideSummary$Date,"-01",sep=""),"%y-%m-%d")
homicideXTS <- as.xts(homicideSummary$Freq,order.by=homicideSummary$Date)
dygraph(homicideXTS)
Reading over the individual police reports, one thing that sticks out is that many of these incidents involve someone the victim knows and often someone the victim knows very closely. This is in stark contrast to the fraud reports where most cases involved a stranger or individual the victim may have never met in person. I think we often invent ghosts when we think about violent crimes. These cases do happen, but more often it seems to be a result of tension within an individual’s community.