The PNP publishes some crime reports on the Bantay Krimen website, which I have downloaded and loaded into R. These come from NCR only.
library(rjson)
y <- fromJSON(file="~/projects/pnp-crime-stats/data/crime_stats.json")
df <- data.frame(matrix(unlist(y), nrow = length(y), byrow=T))
names(df) <- names(y[[1]])
df$date2 <- as.Date(df$date)
The data represents a total of 3298 incidents between 2015-07-21 and 2016-06-01. For each incidence we have the following information:
lng, lat, date, customdate, time, customtime, crime, location, region, province, modus, crimetype, station, moduscode, date2
How much is each crime represented in the overall dataset? Here we graph the count for each.
suppressMessages(library(ggvis))
suppressMessages(library(dplyr))
df %>%
group_by(crime) %>%
summarize(count = n()) %>%
ggvis(~crime, ~count) %>%
layer_bars() %>%
add_axis("x", properties = axis_props(
labels = list(angle = 45, align = "left", fontSize = 12)
))
There are four types of crimes represented in the database:
1. ANTI-CARNAPPING ACT (R.A. 6539) MC (Motorcycle Napping)
2. ANTI-CARNAPPING ACT (R.A. 6539) MV (Carnapping) 3. ROBBERY 4. THEFT
We can get an overall idea of the trend of these crime reports by summarizing in a single time series and adding a model. In this case we’ll use LOESS with a standard error to highlight any trend, however that is not a very good model.
df %>%
group_by(date2) %>%
summarize(count = n()) %>%
ggvis(~date2, ~count) %>%
layer_lines(stroke := "lightblue") %>%
layer_model_predictions(model = "loess",
se = TRUE) %>%
add_axis("x", properties = axis_props(
labels = list(angle = 45, align = "left", fontSize = 12)
))
## Guessing formula = count ~ date2
The daily report volume drops very low before dropping off completely, these statistics probably aren’t accurate anymore if they ever were. It is difficult to ask police to comply with the extra reporting procedures.
Since theft represented the largest volume of reports by crime type we can view just that one crime type:
df %>%
filter(crime == "THEFT") %>%
group_by(date2) %>%
summarize(count = n()) %>%
ggvis(~date2, ~count) %>%
layer_lines(stroke := "pink") %>%
layer_model_predictions(model = "loess",
se = TRUE) %>%
add_axis("x", properties = axis_props(
labels = list(angle = 45, align = "left", fontSize = 12)
))
## Guessing formula = count ~ date2
We can map the crimes as points using the leaflet package. We’ll add a color coding by the four crime types available, and add a popup tooltip to decribe the crime using the modus text variable.
library(leaflet)
pal <- colorFactor(c("#000000", "#0000AA", "#AA0000", "#00AA00"), NULL, n = 4)
df$lat <- as.numeric(as.character(df$lat))
df$lng <- as.numeric(as.character(df$lng))
leaflet(df) %>%
addTiles() %>%
addCircleMarkers(~lng,
~lat,
color = ~pal(crime),
radius = 3,
popup = ~crime)
It may be useful to know where crimes are occuring. We can use the default clustering options from leaflet to group points by zoom level.
leaflet(df) %>%
addTiles() %>%
addCircleMarkers(~lng,
~lat,
color = ~pal(crime),
radius = 3,
popup = ~crime,
clusterOptions = markerClusterOptions())
The above clustering is a great start, but leaves a bit to be desired. I would like to know where specific types of crimes occur, for example, to avoid parking in a car jacking hotspot.