I finally signed the lease for my future appartment. Now it’s time to see whether my new neighborhood is safe or not !

I used this dataset, which contains information about breaking & entering in Montreal.

Read the data and group it by PDQ (police station):

library(readr)
library(dplyr)
library(maptools)
library(rgeos)
library(ggplot2)
crime_df <- read_csv("donneesouvertes-citoyens.csv")

crime_grouped_df <- crime_df %>% group_by(PDQ) %>%
                    summarise(N = n())

Download the corresponding shapefile from here.

We can then read and clean it:

sh <- readShapePoly("Limites_PDQ_2016_Lat_Long.shp")
sh@data$NOM_LIEU <- parse_number(sh@data$NOM_LIEU)

Then we merge crime data with shapefile data:

sh@data <- sh@data %>% left_join(crime_grouped_df, by=c("NOM_LIEU"="PDQ"))

Finally we can plot:

sh@data$ntile <- ntile(sh@data$N, 4)
sh@data$frequency <- sh@data$N / 15

colors = c("khaki2", "goldenrod2", "darkorange1", "firebrick2")
plot_colors <- colors[sh@data$ntile]

plot(sh, col=plot_colors)

title("Montreal map: breaking & entering", sub="Data: from 201501 to 201603 \n 
      (source: http://donnees.ville.montreal.qc.ca/dataset/actes-criminels)",
      cex.sub=0.8)

legend("topleft", legend=c("Very low risk", "Low risk", "Medium risk", "High risk"),
        fill=colors, cex=0.75)

centroids <- gCentroid(sh, byid=TRUE)
indexes <- sh@data$NOM_LIEU %in% c(23, 26, 38, 44)
text(centroids$x[indexes], centroids$y[indexes], sh@data$NOM_LIEU[indexes], cex=0.6)

Here is the meaning of the risk levels:

ggplot(data=sh@data %>% na.omit(), aes(factor(ntile), frequency)) + 
  geom_boxplot(fill="lavenderblush") + 
  labs(x="Risk level", y="Monthly incident frequency", 
       title="Risk levels in terms of monthly frequency of breaking & entering") +
  scale_x_discrete(labels=c("Very low risk", "Low risk", "Medium risk", "High risk"))

As for the red areas on the map, they are:

Oh and in case you’re wondering…my neighborhood is “Very low risk” :)