Background

During the Colombian civil war, Medellín, the country’s second largest city, was severely affected by the conflict, and became infamous for having one of the highest murder rates in the world. Since the early 2000’s, Medellín has achieved a remarkable transformation, dramatically reducing violence and crime by connecting poorer neighbourhoods by metro cable and other public transport, as well as by investing heavily in education.

Medellín is often touted as a model for how post-conflict cities can lift themselves out violence and crime. However, despite its progress, the city still struggles with severe violence, poverty, and the continued presence of armed groups, including drug traffickers, paramilitary organizations, and ELN guerrillas.

The following is a short exploratory analysis of violence rates in Medellín from 2010 to 2019. The analysis uses data from Colombia’s extensive online public data repository.

Reproducible data collection and preparation by clicking the ‘code’ button to the right:

# 1. Prepare packages and directory ------------------------------------------

# Data wrangling
library(dplyr)
library(tidyr)
library(stringr)
library(stringi)
# library(janitor)
# library(foreign)
# library(jsonlite)
library("RSocrata")
library("fuzzyjoin") # merging datasets by partially matching strings 
library(compare)
library('MazamaSpatialUtils')

# Plotting
library(tmap)
# library(sp)
library(sf)
library(rgdal)
library(rgeos) # required for maptools
library(maptools)
# library(tmaptools)
# library(leaflet)
library(ggplot2)

# Analysis & reporting
library(lme4)
library(texreg)
library(apaTables)

# 2. Get barrio (neighbourhood) data -----------------------------------------

# Source: Open data obtained from the municipality of Medellín's website:
# url = paste0('https://geomedellin-m-medellin.opendata.arcgis.com/datasets/',
#        'limite-barrio-vereda-catastral/data?geometry=',
#        '-75.627%2C6.228%2C-75.528%2C6.258')
 

## 2.1 Download data
url = paste0('https://opendata.arcgis.com/datasets/',
             '1a6dbf15865b4357aa559699ea90c5b9_7.zip')

# Create temporary directory
td = tempdir()
# Create placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")
# Download into placeholder file
download.file(url, tf, mode = "wb") #"mode='wb'" may not work on non-windows OS

# get the name of the first file in the zip archive
# (fname = unzip(tf, list=TRUE)$Name[1]) # Limite_Barrio_Vereda_Catastral.shx

# Unzip the file
unzip(tf, exdir=td, overwrite=TRUE)

# Read in shapefile with readOGR(): postal_codes
med_map = readOGR(td, "Limite_Barrio_Vereda_Catastral", 
                  encoding = 'ESRI Shapefile')
                  # use_iconv = T)# encoding does not work together with use_iconv 

# st_is_valid(med_map@polygons, reason = TRUE)

## 2.2. Prep data

## Repairing invalid geometries and reducing size
fix_geometry = function(spdf){ # spatialpolygonsdataframe
  ## Set CRS
  #proj4string(med_map) # S CRS
  spdf <- spTransform(spdf, "+proj=longlat +datum=WGS84")
  ## Using procedure by stackexchange user Simbamangu in URL: 
  # url = paste0('https://gis.stackexchange.com/questions/153905/',
  #                'dissolving-unifying-ill-behaved-irregular-polygons-in-r')
  # "Set to a projected CRS - could be anything"
  # [fixes problem where invalid warnings are thrown by gBuffer]
  polysCRS <- spdf@proj4string
  spdf@proj4string <- CRS("+init=epsg:32737")
  # "Now buffer it but preserve individual objects"
  spdf <- gBuffer(spdf, width = 0, byid = T)
  # Reset the original CRS
  spdf@proj4string <- polysCRS
  ## Simplify geometry to drastically reduce file size
  spdf = MazamaSpatialUtils::simplify(spdf)
  # Simplification reintroduces self-intersection, therefore, repair must be done
  # again. Simplification cannot be performed before this because: 
  # "rgeos_PolyCreateComment: orphaned hole"
  spdf@proj4string <- CRS("+init=epsg:32737")
  # "Now buffer it but preserve individual objects"
  spdf <- gBuffer(spdf, width = 0, byid = T)
  # Reset the original CRS
  spdf@proj4string <- polysCRS
  return(spdf)
  # Warning: "CRS object has comment, which is lost in output" is expected:
  # http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html
}

med_map = fix_geometry(med_map)
# gIsValid(med_map) # > TRUE


## Fix encoding of strings and recode for consistency
recode_names <- function(var){
  # Encoding(var) = "UTF-8"
  i = stri_enc_isutf8(var)
  var[!i] = stri_enc_toutf8(var[!i], validate = T)
  var = iconv(var,from="UTF-8",to="ASCII//TRANSLIT") # keep accents
  # var = iconv(var,from="UTF-8",to="latin1//TRANSLIT") # keep accents
  # var = stri_trans_general(var, id = "Latin-ASCII") # drop accents
  var = str_to_title(var, locale = "es")
  var = str_replace_all(var, "Area De Expansion ", "")
  var = str_replace_all(var, "No\\.", "")
  var = gsub("C(-)([0-9]+)", '', var)
  var = gsub("Cno Reportado([0-9]+)", '', var)
  var = gsub("-", " ", var)
  var = gsub("Reportado([0-9]+)", '', var)
  var = var %>% str_replace_all("  ", " ") # double space
  var = var %>% str_replace_all("13 De Noviiembre", "Trece De Noviembre")
  var = var %>% str_replace_all("Al\\.", "Al")
  var = var %>% str_replace_all("B\\. ", "")
  var = var %>% str_replace_all("Barrios", "Barrio")
  var = var %>% str_replace_all("Caycedo", "Caicedo")
  var = var %>% str_replace_all("Calazans", "Calazanz")
  var = var %>% str_replace_all("Centro Administrativo", "Centro")
  var = var %>% str_replace_all("El Corazon-El Morro", "El Corazon El Morro")
  var = var %>% str_replace_all("El Estadio", "Estadio")
  var = var %>% str_replace_all("El Jardin", "Jardin Botanico")
  var = var %>% str_replace_all("El Progreso", "Progreso")
  var = var %>% str_replace_all("El Playon", "Playon")
  var = var %>% str_replace_all("Jardin Botanico", "Jardin")
  var = var %>% str_replace_all("No Reportado", " - ")
  var = var %>% str_replace_all("La Frisola Palmitas", "La Frisola")
  var = var %>% str_replace_all("La Loma De Los Bernal", "Loma De Los Bernal")
  var = var %>% str_replace_all("La Oculta Guayabal", "La Oculta")
  var = var %>% str_replace_all("La Loma De Los Bernal", "Loma De Los Bernal")
  var = var %>% str_replace_all("Los Conquistadores", "Conquistadores")
  var = var %>% str_replace_all("Media Luna Las Palmas", "Media Luna")
  var = var %>% str_replace_all("Media Luna Parte Alta", "Media Luna")
  var = var %>% str_replace_all("Reportado ", "")
  var = var %>% str_replace_all("Bemejal", "Bermejal") #order matters
  var = var %>% str_replace_all("S\\.a\\.p\\.", "")
  var = var %>% str_replace_all("San Jose De La Cima", "San Jose La Cima")
  var = var %>% str_replace_all("S\\.c\\.", "")
  var = var %>% str_replace_all("Sta\\. E\\.", "")
  var = var %>% str_replace_all("Suburbano", "")
  var = var %>% str_replace_all("Vda\\.", "")
  var = var %>% str_replace_all("Villanueva", "Villa Nueva")
  var = var %>% str_replace_all("Villatina", "Villa Tina")
  var = var %>% str_replace_all("Volcana Guayabal", "Guayabal")
  var = var %>% str_replace_all(" - ", " ")
  var = var %>% str_trim(side = "both")
  var = var %>% str_squish() # remove multiple spaces 2
  var = var %>% str_to_title(locale = "es")
  # var = iconv(var,from="UTF-8",to="ASCII//TRANSLIT") # remove accents
  # var = iconv(hom_2010$barrio, to="latin1//TRANSLIT")
  return(var)
}

med_map$NOMBRE_BAR = recode_names(med_map$NOMBRE_BAR)
med_map$NOMBRE_COM = recode_names(med_map$NOMBRE_COM)

# Total of 322 barrios/veredas and 21 comunas/corregimientos
names_med_map_barrios = med_map@data %>% select(NOMBRE_BAR) %>% unique() %>% 
  arrange(desc(NOMBRE_BAR)); #names_med_map_barrios
names_med_map_comunas = med_map@data %>% select(NOMBRE_COM) %>% unique() %>% 
  arrange(desc(NOMBRE_COM)); #names_med_map_comunas;

#write.csv(med_map, "med_map_medellin.csv")

# 3. Get comuna data ---------------------------------------------------------

# Source: Open data obtained from the municipality of Medellín's website:
# https://www.medellin.gov.co
# paste0(https://geomedellin-m-medellin.opendata.arcgis.com/datasets/',
# 'l%C3%ADmite-catastral-de-comunas-y-corregimientos?',
# 'geometry=-75.703%2C6.245%2C-75.504%2C6.305')

## 3.1 Download data
url = paste0('https://opendata.arcgis.com/datasets/',
             '283d1d14584641c9971edbd2f695e502_6.zip')

# Create temporary directory
td = tempdir()
# Create placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")
# Download into placeholder file
download.file(url, tf, mode = "wb") #"mode='wb'" may not work on non-windows OS

# get the name of the first file in the zip archive
# unzip(tf, list=TRUE)$Name[1]; # 

# unzip the file
unzip(tf, exdir=td, overwrite=TRUE)

# Read in shapefile with readOGR(): postal_codes
comunas = readOGR(td,"L%C3%ADmite_Catastral_de__Comunas_y_Corregimientos")

## 3.2. Population data
# Source: Colombian open data location
# url = paste0('https://www.datos.gov.co/Estad-sticas-Nacionales/',
#              'Proyecciones-De-Poblaci-n-Medell-n-2016-2020/imj6-7tfq')

comunas_pop <- read.socrata("https://www.datos.gov.co/api/odata/v4/imj6-7tfq")

# Get total population in 2018
comunas_pop = comunas_pop %>%
  group_by(c_digo) %>%
  summarize(pop_2017 = sum(total_2017, na.rm = T))

## 3.3. Prep data

## Repairing invalid geometries and reducing size
# gIsValid(comunas) # > Self-intersection at or near point [one coordinate]
comunas = fix_geometry(comunas)
# gIsValid(comunas) # > TRUE

## Fix strings

# Spatial data
comunas$NOMBRE = recode_names(comunas$NOMBRE)

names_comunas = comunas@data %>% select(NOMBRE) %>% unique() %>% 
  arrange(desc(NOMBRE)); #names_comunas;

# Put comuna names first for plotting purposes
comunas@data = comunas@data %>% select(NOMBRE, everything())

# Population data
comunas_pop$c_digo = recode_names(comunas_pop$c_digo)
comunas_pop$c_digo = recode(comunas_pop$c_digo, 'Laureles Estadio' = "Laureles", 
                            'San Antonio' = "San Antonio De Prado")

comunas_pop = comunas_pop %>% select(c_digo, pop_2017)

# names = comunas_pop$c_digo
# names2 = comunas$NOMBRE
# names3 = unique(med_map$NOMBRE_COM)
# sort(unique(c(names, names2, names3))) # Visual inspection of names

comunas_pop$c_digo = gsub(' - Estadio', '', comunas_pop$c_digo) # correct one 

# 4. Get transport data ------------------------------------------------------

# Source: "Corredores para Transporte de Pasajeros": 
# https://www.medellin.gov.co/geonetwork

## 3.1 Download data
url = paste0('https://opendata.arcgis.com/datasets/',
             'c33b78a7e66e4f5faab0c624e1836d5e_13.zip')

# Create temporary directory
td = tempdir()
# Create placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")
# Download into placeholder file
download.file(url, tf, mode = "wb") #"mode='wb'" may not work on non-windows OS

# get the name of the first file in the zip archive
# unzip(tf, list=TRUE)$Name[1] # "Estaciones.cpg"

# unzip the file
unzip(tf, exdir=td, overwrite=TRUE)

# Read in shapefile with readOGR(): postal_codes
# transport = readOGR(td,"Estaciones")
transport = readOGR(td,"Corredores_para_Transporte_de_Pasajeros")

# Prep geometry
transport <- spTransform(transport, "+proj=longlat +datum=WGS84")

# 5. Get poverty data --------------------------------------------------------

## 7.1. Full data

# Source: Public data obtained from the Colombian for 2018
# National Administrative Department of Statistics (DANE)
# url = paste0('https://www.dane.gov.co/index.php/estadisticas-por-tema/',
#              'pobreza-y-condiciones-de-vida/pobreza-y-desigualdad')

# NOTE: full dataset covers all of Colombia and is very large (≈ 200 MB). 
# Smaller subsets or queryable versions do not appear to exist. Recommend using
# the subset I uploaded to google drive based on the procedure below. 

# url = paste0('http://geoportal.dane.gov.co/visipm/DANE_IPMxMZ.rar')

# # Create temporary directory
# td = tempdir()
# # Create placeholder file
# tf = tempfile(tmpdir=td, fileext=".zip")
# # Download into placeholder file
# download.file(url, tf, mode = "wb") #"mode='wb'" may not work on non-windows OS
# 
# # get the name of the first file in the zip archive
# # fname = unzip(tf, list=TRUE)$Name[1] # ipm_clase_1.cpg

# # unzip the file
# unzip(tf, exdir=td, overwrite=TRUE)

# ## Create subset for Medellin
# poverty = readOGR("td","ipm_clase_1")
# poverty = poverty[str_detect(poverty$cod_dane, '^05001')]

# Subet based on Medellín municipal code
# poverty = subset(poverty, str_detect(poverty$cod_dane, '^05001'))  

## Export subset
# setwd(base_dir)
# writeOGR(test_2, "Maps/poverty/medellin", "medellin_poverty", 
#          driver="ESRI Shapefile")

## Download subset
url = paste0('https://drive.google.com/u/0/uc?id=', 
             '1IiJPXZWDJz1JcueHXeGoUM1HK5V-kZCz&export=download')

# Create temporary directory
td = tempdir()
# Create placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")
# Download into placeholder file
download.file(url, tf, mode = "wb") 

# get the name of the first file in the zip archive
# unzip(tf, list=TRUE)$Name[1]; # medellin_poverty.dbf

# # unzip the file
unzip(tf, exdir=td, overwrite=TRUE)

## Load data
poverty = readOGR(td,"medellin_poverty")

## Create barrio ID for merging 
# Based on Colombian statistics bureau DANE territorial codes. 
# For details, see DANE manual p8:
# paste0('https://www.dane.gov.co/files/sen/lineamientos/',
# 'manual-uso-marco-geoestadistico-nacional-en-proceso-estadistico.pdf')

poverty$CODIGO = substr(poverty$cod_dane, 15, 18) #cut out barrio code

## Positively skewed data
# plot(density(poverty$ipm, na.rm = T))
# abline(v=mean(poverty$ipm, na.rm = T), col="red")
# abline(v=mean(poverty$ipm, tr=.20, na.rm = T), col="darkgreen")
# abline(v=median(poverty$ipm, na.rm = T), col="blue")

# summary(poverty$ipm)
# sd(poverty$ipm, na.rm = T)

poverty_summarized = poverty@data %>%
  group_by(CODIGO) %>%
  summarize(ipm_med = median(ipm, na.rm = T),
            ipm_sd = sd(ipm, na.rm = T))

# 6. Get homicides data ------------------------------------------------

##### 8.1. Load data ###

# Source: Colombian police (DIJIN), open data. 
# "https://www.datos.gov.co/profile/bn7v-9rpg"

# Colombian open data location
gov_co = "https://www.datos.gov.co/resource/"


get_hom_dta <- function(dataset_id, df_year){
  x <- read.socrata(paste0(gov_co, dataset_id,
                           "?$where=municipio = 'MEDELLÍN (CT)'"))
  x <- x %>% select(
    starts_with('barrio'), zona, ends_with('sitio'), 
    edad, matches('sexo|genero'), matches('cantidad|X_*')) %>%
    `colnames<-`(c("barrio", "zone", "place", "age", "sex", "cases"))
  x$year <- df_year
  x$crime <- "homicide"
  #  x = select(x, -starts_with("X_"))
  return(x)
}

hom_2010 <- get_hom_dta("qs63-3nue.csv", "2010")
hom_2011 <- get_hom_dta("xh3s-ukky.csv", "2011")
hom_2012 <- get_hom_dta("tcwj-f6mb.csv", "2012")
hom_2013 <- get_hom_dta("t4mv-zwa6.csv", "2013")
hom_2014 <- get_hom_dta("fbrt-d6qx.csv", "2014")
hom_2015 <- get_hom_dta("cfga-dm6m.csv", "2015")
hom_2016 <- get_hom_dta("umwz-bt8i.csv", "2016")
hom_2017 <- get_hom_dta("mkw6-468s.csv", "2017")
hom_2018 <- get_hom_dta("pz7x-mkbs.csv", "2018")

hom_2019 <- read.socrata(paste0(gov_co, "9x54-wgwx.csv", # missing col names 
                                "?$where=unnamed_column_1 = 'MEDELLÍN (CT)'"))
hom_2019 <- hom_2019 %>% 
  select(unnamed_column_4:unnamed_column_6, unnamed_column_10, 
         unnamed_column_11, unnamed_column_18) %>%
  mutate(year = 2019, crime = "homicide")

names(hom_2019) <- names(hom_2010) # fix column names

hom_merged = rbind(hom_2010, hom_2011, hom_2012, hom_2013, hom_2014, hom_2015, 
                   hom_2016, hom_2017, hom_2018, hom_2019)
rm(hom_2010, hom_2011, hom_2012, hom_2013, hom_2014, hom_2015, # remove
   hom_2016, hom_2017, hom_2018, hom_2019)

##### 8.2. Prepping data ###

### fix strings
hom_merged$barrio = recode_names(hom_merged$barrio)
# sort(unique(hom_merged$barrio))

# names = unique(med_map$NOMBRE_BAR)
# names2 = unique(hom_merged$barrio)
# sort(unique(c(names, names2)))

hom_merged$barrio = gsub(' Palmitas', '', hom_merged$barrio) # last correction  

### Summarize murders by neighbourhood and year
homicides = hom_merged %>%
  group_by(barrio, year) %>%
  summarize(murders = sum(cases)) %>%
  pivot_wider(names_from = "year", names_prefix = "hom_",
              values_from = murders)
homicides$murders = rowSums(homicides[2:11], na.rm = T)

#### Compare neighborhood names with base map
# names_hom = hom_merged %>% select(barrio) %>% unique() %>% 
#   arrange(desc(barrio))# 374 obs
# names_hom; nrow(names_hom)
# names_med_map_barrios; nrow(names_med_map_barrios)
# 
# 
# comparison <- stringdist_join(names_med_map_barrios, names_hom,
#                               by = c(NOMBRE_BAR = "barrio"),
#                               max_dist = 2, mode = 'left', method = 'qgram')
# comparison[ with(comparison, order(NOMBRE_BAR, barrio)), ]
# comparison_2 = left_join(names_med_map, names_hom, by = c("NOMBRE_BAR" = "barrio"))
# Same number of rows

# 7. Get terrorism data ------------------------------------------------

##### 7.1. Load data ###

# To avoid problems with Spanish accents: 
# Sys.setlocale("LC_CTYPE", "spanish")
# Sys.setlocale(category="LC_ALL", locale = "en_US.UTF-8")

# Colombian open data location
gov_co = "https://www.datos.gov.co/resource/"

# Source: Colombian police (DIJIN), open data. 
# "https://www.datos.gov.co/profile/bn7v-9rpg"

get_ter_dta <- function(dataset_id, df_year){
  x <- read.socrata(paste0(gov_co, dataset_id,
                           "?$where=municipio = 'MEDELLÍN (CT)'"))
  x <- x %>% select(
    starts_with('barrio'), zona, ends_with('sitio'), 
    matches('20*|X_|cantidad')) %>%
    `colnames<-`(c("barrio", "zone", "place", "cases"))
  x$year <- df_year
  x$crime <- "terrorism"
  #  x = select(x, -starts_with("X_"))
  return(x)
}

ter_2010 <- get_ter_dta("pd5u-zqt3.csv", "2010")
ter_2011 <- get_ter_dta("kar7-y55g.csv", "2011")
ter_2012 <- get_ter_dta("2mnu-y5qu.csv", "2012")
ter_2013 <- get_ter_dta("eqb9-wneu.csv", "2013")
ter_2014 <- get_ter_dta("u3w6-xbe3.csv", "2014")
ter_2015 <- get_ter_dta("eq2h-makd.csv", "2015")
ter_2016 <- get_ter_dta("s6em-tsyp.csv", "2016")
ter_2017 <- get_ter_dta("u3w6-xbe3.csv", "2017")
# ter_2018 <- get_ter_dta("hmf8-7667.csv", "2018")# no cases

ter_2019 <- read.socrata(paste0(gov_co, "m762-bi7k.csv", # missing col names 
                                "?$where=unnamed_column_1 = 'MEDELLÍN (CT)'"))
ter_2019 <- ter_2019 %>% select(unnamed_column_5:unnamed_column_7, 
                                unnamed_column_12) %>%
  mutate(year = 2019, crime = 'terrorism')
names(ter_2019) <- names(ter_2010) # fix column names

ter_merged = rbind(ter_2010, ter_2011, ter_2012, ter_2013, ter_2014, ter_2015, 
                   ter_2016, ter_2017, ter_2019)
rm(ter_2010, ter_2011, ter_2012, ter_2013, ter_2014, ter_2015,
   ter_2016, ter_2017, ter_2019)

##### 7.2. Prepping data ###

# fix strings
ter_merged$barrio = recode_names(ter_merged$barrio)

#View(unique(sort(ter_merged$barrio))); View(unique(sort(med_map$NOMBRE_BAR)))

# names = unique(med_map$NOMBRE_BAR)
# names2 = unique(terrorism$barrio)
# sort(unique(c(names, names2)))

ter_merged$barrio = gsub('Uiniversidad', 'Universidad', 
                         ter_merged$barrio) # last correction  

### Summarize murders by neighborhood and year

terrorism = ter_merged %>%
  group_by(barrio, year) %>%
  summarize(terrorism = sum(cases)) %>%
  pivot_wider(names_from = "year", names_prefix = "ter_", 
              values_from = terrorism)
terrorism$terrorism = rowSums(terrorism[2:10], na.rm = T)

#### Compare neighborhood names with base map
# names_ter = ter_merged %>% select(barrio) %>% unique()

# comparison <- stringdist_join(names_med_map, names_ter, 
#                               by = c(NOMBRE_BAR = "barrio"),
#                               max_dist = 2, mode = 'left', method = 'qgram')
# comparison[ with(comparison, order(NOMBRE_BAR, barrio)), ]
# comparison_2 = left_join(names_med_map, names_ter, 
#                          by = c("NOMBRE_BAR" = "barrio"))
# Same number of rows 

# 8. Merge data --------------------------------------------------------------

# 8.1 Merge barrio spatial data

merged = med_map

## Homicide and terrorism data
merged@data = left_join(med_map@data, homicides, by = c("NOMBRE_BAR" = "barrio"))

merged@data = left_join(merged@data, terrorism, by = c("NOMBRE_BAR" = "barrio"))
# head(merged@data); nrow(merged@data)

merged@data = left_join(merged@data, poverty_summarized, by = 'CODIGO')

# Put neighbourhood names first for plotting purposes
merged@data = merged@data %>% select(NOMBRE_BAR, everything())

# 8.2 Merge comunas spatial data

## Get, merge homicide, poverty, and comuna data
data = merged@data %>% 
  select(NOMBRE_COM, NOMBRE_BAR,hom_2017, murders, ipm_med, ipm_sd)

data = data %>%  
  group_by(NOMBRE_COM) %>% select(everything(), -NOMBRE_BAR) %>%
  summarise_at(vars(hom_2017:murders, ipm_med:ipm_sd), 
               funs(sum(.,na.rm = T), mean(.,na.rm = T)))

comunas@data = left_join(comunas@data, data, by = c("NOMBRE" = "NOMBRE_COM"))

## Merge population data

comunas@data = left_join(comunas@data, comunas_pop, by = c("NOMBRE" = "c_digo"))


# 9. Prepare plotting and analysis ----------------------------------------

## Round numbers
comunas@data = comunas@data %>% mutate_if(is.numeric, round, digits=2)
merged@data = merged@data %>% mutate_if(is.numeric, round, digits=2)

## Add armed groups
comunas@data = comunas@data %>% mutate(armed_grps = 
  ifelse(NOMBRE == "Altavista", paste("AGC", "AN", "O", "Oth", sep = ", "),
  ifelse(NOMBRE == "Aranjuez", paste("T", "O", sep = ", "),
  ifelse(NOMBRE == "Belen", paste("AN", "O", sep = ", "),
  ifelse(NOMBRE == "Buenos Aires", paste("AGC", "O", sep = ", "),
  ifelse(NOMBRE == "Castilla", paste("O", "Oth", sep = ", "),
  ifelse(NOMBRE == "Doce De Octubre", paste("AN", sep = ", "),
  ifelse(NOMBRE == "El Poblado", paste("O", sep = ", "),
  ifelse(NOMBRE == "Guayabal", paste("O", sep = ", "),
  ifelse(NOMBRE == "La America", paste("O", sep = ", "),
  ifelse(NOMBRE == "La Candelaria", paste("CON", "O", sep = ", "),
  ifelse(NOMBRE == "Laureles", paste("O", sep = ", "),
  ifelse(NOMBRE == "Manrique", paste("O", sep = ", "),
  ifelse(NOMBRE == "Palmitas", paste("AGC", sep = ", "),
  ifelse(NOMBRE == "Popular", paste("O", "T", sep = ", "),
  ifelse(NOMBRE == "Robledo", paste("O", "T", sep = ", "),
  ifelse(NOMBRE == "San Antonio De Prado", paste("AGC", sep = ", "),
  ifelse(NOMBRE == "San Cristobal", paste("AGC", "Oth", sep = ", "),
  ifelse(NOMBRE == "San Javier", paste("AGC", "O", "Oth", sep = ", "),
  ifelse(NOMBRE == "Santa Cruz", paste("T", sep = ", "),
  ifelse(NOMBRE == "Santa Elena", paste("AGC", sep = ", "),
  ifelse(NOMBRE == "Villa Hermosa", 
  paste("AGC", "O", sep = ", "), NA))))))))))))))))))))))

comunas@data = comunas@data %>% 
  mutate(grp_agc = ifelse(str_detect(armed_grps,"\\bAGC\\b") == TRUE, 1,NA)) %>%
  mutate(grp_an = ifelse(str_detect(armed_grps,"\\bAN\\b") == TRUE, 1,NA)) %>%
  mutate(grp_con = ifelse(str_detect(armed_grps,"\\bCON\\b") == TRUE, 1,NA)) %>%
  mutate(grp_o = ifelse(str_detect(armed_grps,"\\bO\\b") == TRUE, 1,NA)) %>%
  mutate(grp_oth = ifelse(str_detect(armed_grps,"\\bOth\\b") == TRUE, 1,NA)) %>%
  mutate(grp_t = ifelse(str_detect(armed_grps,"\\bT\\b") == TRUE, 1,NA))


## 9.2 Separate and merge data 
# Cannot add to spatial data since every neighbourhood needs to occupy 1 row

# Separate data
merged_data = merged@data
comunas_data = comunas@data
comunas_data = comunas_data %>% select(NOMBRE, hom_2017_sum:grp_t)

# Merge
data = full_join(hom_merged, ter_merged)

data = left_join(merged_data, data,
                 by = c("NOMBRE_BAR" = "barrio"))
# data = left_join(data, merged_data,
#                         by = c("barrio" = "NOMBRE_BAR"))

data = left_join(data, comunas_data,
                 by = c("NOMBRE_COM" = "NOMBRE"))

## For plotting homicide locations
data = data %>% 
  mutate(place = str_to_title(place)) %>% 
  mutate(place = ifelse(grepl(".*Via*", place), "Public Road",
                        ifelse(grepl(".*Habitacion*", place), 
                               "Private location",
                               ifelse(grepl(".*Apartamento*", place), 
                                      "Private location",
                                      "Other Public Location"))))


# 10. Finalize and save data ----------------------------------------------

## Remove unneeded data 
rm(names_hom, names_ter, names_med_map_barrios, names_med_map_comunas, names_comunas,
   med_map, med_map_data, homicides, hom_merged, poverty, poverty_summarized,
   ter_merged, terrorism, comunas_pop, polysCRS)


## Save data
save.image("medellin_analysis_data_15_06_2020.RData")

Homicides 2010-2019

The following graphs provide essential descriptive data on homicide rates in the city and which groups are the most affected (use the tabs below to navigate between them). Homicide rates are taken from the open data of the Colombian national police (DIJIN).¹ The data records a total of 7520 homidices during the entire period.

By year

Although homicide rates have fallen by more than half since the early 2010’s, they remain high. With a population of 2.7 million in 2015,² the murder rate in that year was 1077.415 per 100,000 inhabitants, which was still lower than the national average that year of 26.5/100,000.³

## By year

desc_year = ggplot(data=subset(data, !is.na(year)),
       aes(x = year, y = cases)) + geom_col(fill = 'steelblue') +
  ylab("Homicides") + xlab("Year") +
  theme_minimal()

desc_year

By gender

As can be seen below, the overwhelming majority of victims are male.

## By gender

desc_gender = ggplot(data=subset(data, !is.na(sex)),
       aes(x = sex, y = cases)) + geom_col(fill = 'steelblue') +
  ylab("Homicides") + xlab("Gender") +
  scale_x_discrete(labels=c("Female", "Male")) +
  theme_minimal()

desc_gender

By age

Most murder victims are young, with an median age of 28.

# By age

desc_age = ggplot(data=subset(data, !is.na(age)),
       aes(x = age, y = cases)) + geom_col(fill = 'steelblue') +
  ylab("Homicides") + xlab("Age") +
  scale_x_continuous(
    breaks = round(seq(min(na.omit(data$age)),
                       max(na.omit(data$age)), by = 5),-1)) +
  theme_minimal()

desc_age

By location

The overwhelming majority of homicides were reported as taking place in public, mostly on the city streets, indicating that public places remain potentially dangerous.

## By location

desc_location = ggplot(data=subset(data, !is.na(place)),
       aes(x = reorder(place, -cases), y = cases)) +
  geom_col(fill = 'steelblue') +
  ylab("Homicides") + xlab("Location") +
  theme_minimal()

desc_location

Plotting homocides rates

The following plots map the distribution of homicides by neighbourhoods (known as ‘barrios’ in urban areas and ‘veredas’ in rural zones, delimited here by black borders), and communes (urban ‘comunas’ or rural ‘corregimientos’, delimited by thicker borders). Map data was collected using open data from the municipality of Medellín.⁴ Because of inconsistencies in how administrative units were coded, 41 homicides could be matched to the correct map areas, meaning that actual homicide rates are higher.

Total homicides

Most homicides appear to be concentrated in a few areas of the city. Comuna 13 to the west of the centre has historically been a stronghold for paramilitary and guerilla groups, and saw the most intense urban warfare during the conflict. Most guerillas were expelled from here by major military operations in 2002. However, armed groups of various affiliations and backgrounds continue to inhabit many parts of the city, most notably comunas 1 and 3 to the north east, as well as comunas 5 and 6 to the north west.⁵

#echo=FALSE, warning=FALSE}

tmap_mode("view")
tmap_options(basemaps = c("OpenStreetMap", "Esri.WorldGrayCanvas", 
                          "Esri.WorldTopoMap"))

tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "#666666") +
  tm_fill(col = "murders", style = "fixed", palette = "YlOrRd",
          breaks = c(1, 5, 10, 20, 50, 100, 150, 200), alpha = 0.5,
          showNA = F, title = 'Homicides',
          popup.vars = c('Homocides 2010-2019' = 'murders',
                         'Comuna' = 'NOMBRE_COM')) +
  tm_shape(comunas) +
  tm_borders(col = "#336666", lwd = 2) +
  tm_text(text = "COMUNA", size = 1.5, style = "pretty",
          shadow = T, alpha = 0.7, overwrite.lines = T)

# Warning: "CRS object has comment, which is lost in output" is expected:
# http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html

Early vs late 2010’s

This plot shows the difference in homicide rates between the early (2010-2012) and late (2017-2019) period of the last decade. The city has seen a notable decline in homicides, however, as this comparison indicates, violence remains lodged in the same areas of the city. A notable exception is comuna 13, which has seen a dramatic decline in violence, and is now a popular tourism destination.

# echo=FALSE, warning=FALSE

# Create comparison of years 2010-2012 vs 2017-2019                    
merged$hom_2010_2012 = rowSums(merged@data[c
                                          ('hom_2010', 'hom_2011', 'hom_2012')],
                               na.rm = T)
merged$hom_2010_2012[merged$hom_2010_2012 == 0] = NA

merged$hom_2017_2019 = rowSums(merged@data[c
                                          ('hom_2017', 'hom_2018', 'hom_2019')],
                               na.rm = T)
merged$hom_2017_2019[merged$hom_2017_2019 == 0] = NA

hom_map_comp = tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "#666666") +
  tm_fill(col = c('hom_2010_2012', 'hom_2017_2019'), style = "fixed",
          palette = "YlOrRd",
          breaks = c(1, 3, 5, 10, 20, 30, 60, 70, 100), alpha = 0.5,
          showNA = F, title = c("2010-2012", "2017-2019"),
          popup.vars = c('Homicides 2010-2012' = "hom_2010_2012",
                         'Homicides 2017-2019' = "hom_2017_2019",
                         'Homicides 2010-2019 (total)' = 'murders',
                         'Comuna' = 'NOMBRE_COM')) +
  tm_facets(ncol = 2, nrow = 1)+ #free.coords = FALSE) +
  tm_shape(comunas) +
  tm_borders(col = "#336666", lwd = 2) +
  tm_text(text = "COMUNA", size = 1.5, style = "pretty",
          shadow = T, alpha = 0.7, overwrite.lines = T)

hom_map_comp

Terrorism 2010-2019

The following is a map of events classified as acts of terrorism by Colombian police. Most events occured in the peripheries of the city, and central parts of the city are relatively unaffected.

ter_map = tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "#666666") +
  tm_fill(col = "terrorism", style = "cat", palette = "YlOrRd",
          alpha = 0.5, showNA = F, title = 'Terrorism',
          popup.vars = c('Terrorism 2010-2019' = 'terrorism',
                         'Homocides 2010-2019' = 'murders',
                         'Comuna' = 'NOMBRE_COM')) +
  tm_shape(comunas) +
  tm_borders(col = "#336666", lwd = 2) +
  tm_text(text = "COMUNA", size = 1.5, style = "pretty",
          shadow = T, alpha = 0.7, overwrite.lines = T)

ter_map

Factoring in population factors

Homicide rates standardized by population - 2017

Homicide rates may appear concentrated in the central areas of the city, yet this may be (partly) due to those areas having a higher population. The Standardized Morbidity Ratio (SMR) is a common approach in disease research, used to calculate the ratio of observed deaths in a group compare to the expected deaths in the general population.

Areas with an SMR above one exceed expected homicide rates while those below have less than expected, based on their population. The map below shows the SMR of homicides in 2017 (SMR is indicated by shade of the region, while the bubbles indicate homicide rates per neighborhood). Rural locations (mostly located in area codes 50 through 90) appear to have quite high homicide rates taking into account their smaller populations, yet the central district of Medellín still display a very high SMR, despite its high population density.

# Compute and print the overall homicide rates
r <- sum(na.omit(comunas$hom_2017_sum)) / sum(na.omit(comunas$pop_2017))
options(scipen=999)
# r

# Calculate the expected number for each borough
comunas$hom_exp <- comunas$pop_2017 * r

# Calculate the ratio of OBServed to EXPected
comunas$hom_smr <- comunas$hom_2017_sum / comunas$hom_exp

# Plot
tm_shape(comunas, alpha = 0.5) +
  tm_borders(col = "#336666",  lwd = 2) +
  tm_fill(col = "hom_smr", style = "fixed",
          breaks = c(0, 0.16, 0.5, 1, 1.6, 2, 5, 10), midpoint = 1,
          alpha = 0.5,
          showNA = F, title = 'SMR - Homicides',
          palette = "-RdYlGn", n = 7, contrast = c(0, 1),
          popup.vars = c('Expected homicides' = 'hom_exp',
                         'Homocides 2017' = 'hom_2017_sum',
                         'Homocides SMR' = 'hom_smr',
                         'Comuna' = 'NOMBRE',
                         'Population' = 'pop_2017')) +
  tm_text(text = "COMUNA", size = 1.5, style = "pretty",
          shadow = T, alpha = 0.7, overwrite.lines = T) +
  tm_shape(merged) +
    tm_borders(col = "#666666") +
    tm_bubbles(size = "hom_2017", col = "hom_2017", style = "fixed",
             palette = "YlOrRd", showNA = F, alpha = 0.5,
             title.col = 'Homicides 2017',
             popup.vars = c('Homocides 2017' = 'hom_2017',
                         'Barrio' = 'NOMBRE_BAR',
                         'Comuna' = 'NOMBRE_COM'),
             breaks = c(1, 2, 3, 5, 10, 20, 38))

Presence of Paramilitary Groups - 2017

Despite having regained much control of the city since the height of the conflict, several paramilitary and/or criminal groups occupy Medellín. These groups are engaged in drug operations and other criminal enterprises throughout the city.

The following map represents where the most notable groups were active in 2017, based on research by the organization Coordinación Colombia Europa Estados Unidos.⁶ The research reports that confrontations between the heavily armed paramilitary groups ACN, groups under the command of the AGC, as well as with the police, led to heavy severe violence, particularly in the border between areas 70 and 16. These confrontations likely explain the high concentrations of homicides in this region.

# Plot
tm_shape(comunas, alpha = 0.5) +
  tm_borders(col = "#336666",  lwd = 2) +
  tm_fill(col = "hom_smr", style = "fixed",
          breaks = c(0, 0.16, 0.5, 1, 1.6, 2, 5, 10), midpoint = 1,
          alpha = 0.5,
          showNA = F, title = 'SMR - Homicides',
          palette = "-RdYlGn", n = 7, contrast = c(0, 1),
          popup.vars = c('Expected homicides' = 'hom_exp',
                         'Homocides 2017' = 'hom_2017_sum',
                         'Homocides SMR' = 'hom_smr',
                         'Comuna' = 'NOMBRE',
                         'Population' = 'pop_2017')) +
  tm_text(text = "COMUNA", size = 1.5, ymod = 1.5,
          shadow = T, alpha = 0.7, overwrite.lines = T) +
  tm_shape(comunas) + tm_symbols(size = 'grp_agc', col = 'grp_agc', scale = 0.5,
                                 palette = "Blues",contrast = c(0.75, 1),
                                 legend.col.show = F) +
  tm_shape(comunas) + tm_squares(size= "grp_an", col = 'grp_an', scale = 0.5,
                                 xmod = -1, palette = "Greens",
                                 contrast = c(0.75, 1),legend.col.show = F) +
  tm_shape(comunas) + tm_squares(size= "grp_con", col = 'grp_con', scale = 0.5,
                                 ymod = -1, legend.col.show = F,
                                 palette = "RdPu", contrast = c(0.35, 1)) +
  tm_shape(comunas) + tm_squares(size= "grp_o", col = 'grp_o', scale = 0.5,
                                 ymod = -1, xmod = -1, legend.col.show = F,
                                 palette = "OrRd", contrast = c(0.75, 1)) +
  tm_shape(comunas) + tm_squares(size= "grp_oth", col = 'grp_oth', scale = 0.5,
                                 ymod = -1, xmod = 0, legend.col.show = F,
                                 palette = "Greys", contrast = c(0.35, 1)) +
  tm_shape(comunas) + tm_squares(size= "grp_t", col = 'grp_t', scale = 0.5,
                                 ymod = -1, xmod = -1, legend.col.show = F,
                                 palette = "YlOrRd", contrast = c(0.25, 1)) +
  tm_add_legend("fill",
                labels = c("AGC - Clan del Golfo", "ACN - Alianza Criminal del Norte",
                           "La Oficina \n del Valle de Aburrá", "CONVIVIR",
                           "Other Groups", "Triana"),
                col = c("#3E68FC", "#0A8200", "#CA0A00", "#F9B2F9", "#DADAD8", 
                        "#EAF000"),
                title = "Paramilitary groups")

Military groups source: Coordinación Colombia Europa Estados Unidos: Presencia de grupos paramilitares y algunas de sus dinámicas en Antioquia, 2017, p:78, Corporación Jurídica Libertad, Fundación Sumapaz"

Confidence intervals

This graph plots the confidence intervals of areas that are in exceedence of expected homicide rates. Four areas clearly stand out as having higher than expected homicide rates for their population, namely comunas 10 and 13, and corregimientos 50 and 90.

# For the binomial statistics function
library(epitools)

# Get CI from binomial distribution
hom_ci <- binom.exact(comunas$murders_sum, comunas$pop_2017)

# Add names
hom_ci$NAME <- comunas$COMUNA

# Calculate comunas rate, then compute SMR
r <- sum(comunas$murders_sum) / sum(comunas$pop_2017)
hom_ci$SMR <- hom_ci$proportion / r

# Subset the high SMR data
hom_high <- hom_ci[hom_ci$SMR > 1, ]

# Plot estimates with CIs
ggplot(hom_high, aes(x = NAME, y = proportion / r,
                     ymin = lower / r, ymax = upper / r)) +
  geom_pointrange() +
  ylab("SMR") + xlab("Comuna") +
  # theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme_minimal()

Distribution of mass transportation network

The following plot displays the transport of the mass-transportation network of Medellín, the expansion of which into the more conflict-affected parts of the city is often credited with accomplishing its reduction in violence. From the current map, however, it is unclear whether such an affect is visible. The highly connected central parts of the city still have some of the highest rates of violence.

# tmap_mode("view")

transport_map = tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "#666666") +
  tm_fill(col = "murders", style = "fixed", palette = "YlOrRd",
          breaks = c(1, 5, 10, 20, 50, 100, 150, 200), alpha = 0.5,
          showNA = F, title = 'Homicides',
                    popup.vars = c('Homocides 2010-2019' = 'murders',
                         'Comuna' = 'NOMBRE_COM')) +
tm_shape(comunas) +
  tm_borders(col = "#336666", lwd = 2) +
  tm_text(text = "COMUNA", size = 1.5, style = "pretty",
          shadow = T, alpha = 0.7, overwrite.lines = T) +
tm_shape(transport) +
    tm_lines(col = '#0080FF',lwd = 1) #col = "LINEA") # transport

transport_map

Exploring poverty rates

The next plot displays poverty rates, using the multidimensional poverty measure developed by the Colombian statistics agency DANE to provide informative measures of poverty at the local level. The measure ranges from 0 (no poverty) to 100 (very poor) and is based on 15 indicators across 5 dimensions. The dimensions are: access to education; conditions for children and youth; employment; health; as well as household conditions and access to public services.⁷ The following maps plot average poverty rates per neighbourhood (rural parts of the municipality are not included in the data).

Poverty rates in 2018

There clearly appears to be a relation between poverty and homicide rates in Medellín, with poor, peripheral areas such as comunas 5, 6 and 13 being most heavily affected, and more affluent areas such as comuna 14 being relatively spared from violence. The relationship does not appear to be a completely straight-forward one, however, with many of the poorest areas of the city being relatively unaffected by violence, such as comuna 1.

This suggests that other factors may be important to consider other factors. One such factor may be the extent of control that various armed groups have in different areas, which tends to be in poorer parts of the city. In many conflict affected cities, including Medellín, armed groups often establish a monopoly of violence and even take over other functions of authority from the state, which has been known to contribute to an (uneasy) peace in such areas.⁸

# tmap_mode("view")

pov_map_1 = tm_basemap(c("Esri.WorldTopoMap", "OpenStreetMap", "Esri.WorldGrayCanvas")) +
tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "black") +
  tm_fill(col = 'ipm_med', alpha = 0.7, palette = "BuPu",
          title = "Median poverty",
          showNA = F, group = 'Poverty', zindex = 500,
          popup.vars = c('Homicides 2010-2019' = 'murders',
                         'Comuna' = 'NOMBRE_COM',
                         'Median poverty' = 'ipm_med',
                         'Poverty variation' = 'ipm_sd')) +
  tm_bubbles(size = "murders", col = "murders", style = "fixed",
             palette = "YlOrRd", showNA = F, alpha = 0.9,
             title.col = 'Homicides 2010-2019', group = 'Homicides',
             breaks = c(1, 5, 10, 20, 50, 100, 150, 200)) +
  tm_shape(comunas) + # Comunas data
  tm_borders(col = "#336666", lwd = 2) +
  tm_text("COMUNA", size = 1.5, style = "pretty")

pov_map_1

Comparing poverty and inequality

Although poverty is often identified as a cause of conflict, inequalities may be at least as important, particularly between groups in close vicinity of each other.⁹ This map plots the variation in poverty rates within a neighbourhood (estimated by calculating the standard deviation of poverty), to measure inequalities within them.

Inequality appears to explain some of the incongruities about poverty discussed earlier. The low violence observed in the poor neighbourhoods of comuna 1 may be explained by how inequalities in the area are relatively low. Similarly, the high levels of inequality found in the central parts of the city (comuna 10) may explain the violence found there.

pov_map_2 = tm_basemap(c("Esri.WorldTopoMap", "OpenStreetMap", "Esri.WorldGrayCanvas")) +
tm_shape(merged, alpha = 0.5) +
  tm_borders(col = "black") +
  tm_fill(col = c('ipm_med', 'ipm_sd'), alpha = 0.7, palette = "BuPu",
          title = c("Median poverty", "Inequality"),
          showNA = F, group = 'Poverty', zindex = 500,
          popup.vars = c('Homicides 2010-2019' = 'murders',
                         'Comuna' = 'NOMBRE_COM',
                         'Median poverty' = 'ipm_med',
                         'Inequality' = 'ipm_sd')) +
  tm_bubbles(size = "murders", col = "murders", style = "fixed",
             palette = "YlOrRd", showNA = F, alpha = 0.9,
             title.col = 'Homicides', group = 'Homicides',
             breaks = c(1, 5, 10, 20, 50, 100, 150, 200)) +
  tm_facets(ncol = 2, nrow = 1)+
  tm_shape(comunas) + # Comunas data
  tm_borders(col = "#336666", lwd = 2) +
  tm_text("COMUNA", size = 1.5, style = "pretty") #+

pov_map_2

Analysing homicide predictors

corr_ipm_med = cor.test(merged$murders, merged$ipm_med, 
                     method = "pearson")

corr_ipm_sd = cor.test(merged$murders, merged$ipm_sd, 
                     method = "pearson")

Running a pearson correlation on total homicides from 2010-2019, median poverty had a correlation coefficient of 0.044 and p-value of 0.54, while inequality shows a coefficient of 0.166 and p-value of 0.02, which clearly indicates that inequality matters more than poverty in this case. The scatter plot in the next tab shows a slight correlation between homicides and inequality, indicating that inequality is an important yet not conclusive predictor.

The following are multilevel models of homicide rates by administrative region in 2017, 2018, and 2019 respectively, offset for population in 2017. For predictors, the models used dummy-variables for the presence of paramilitary groups, another dummy for to indicate whether homicides occured in urban or rural areas of the municipality, as well as an indicator of inequality rates.

The results show that inequality appears to have a low but consistent positive correlation with homicide rates over the years. The effects that the presence of paramilitary groups have on homicide rates appear less clear-cut, with each group varying significantly from year to year. This is likely due to the lack of more fine-grained data than the comuna/corregimiento-level. However, Alianza Criminal del Norte appears to be the most violent group in this data. As tends to be the case in Colombia, homicides appear to be slightly more likely to occur in rural compared to urban areas.

Multilevel model homicides 2017-2019

# Prepare dummy variables
data$grp_agc[is.na(data$grp_agc)] = 0
data$grp_an[is.na(data$grp_an)] = 0
data$grp_con[is.na(data$grp_con)] = 0
data$grp_o[is.na(data$grp_o)] = 0
data$grp_oth[is.na(data$grp_oth)] = 0
data$grp_t[is.na(data$grp_t)] = 0

# Multilevel model of homicides in 2017
m1 = lmer(hom_2017 ~ ipm_sd + zone + grp_agc + grp_an +
              grp_con + grp_o + grp_oth + grp_t +
            (1|COMUNA), offset = log(pop_2017), data = data)

m2 = lmer(hom_2018 ~ ipm_sd + zone + grp_agc + grp_an +
              grp_con + grp_o + grp_oth + grp_t +
            (1|COMUNA), offset = log(pop_2017), data = data)

m3 = lmer(hom_2019 ~ ipm_sd + zone + grp_agc + grp_an +
              grp_con + grp_o + grp_oth + grp_t +
            (1|COMUNA), offset = log(pop_2017), data = data)
# screenreg(m1, digits=3)
# screenreg(m2, digits=3)

htmlreg(list(m1, m2, m3), digits=3,
        custom.model.names = c("2017", "2018", "2019"),
        custom.coef.names = 
          c("Intercept", "Inequality", "Urban zone",
            "AGC - Clan del Golfo", "ACN - Alianza Criminal del Norte",
            "La Oficina \n del Valle de Aburrá", "CONVIVIR",
            "Other Groups", "Triana"))

Statistical models
	2017	2018	2019
Intercept	-11.063^***	-13.436^***	-11.469^***
	(1.527)	(1.335)	(1.265)
Inequality	0.044^**	0.227^***	0.126^***
	(0.015)	(0.008)	(0.007)
Urban zone	-0.979	-0.933^*	-0.467
	(0.816)	(0.372)	(0.287)
AGC - Clan del Golfo	-0.708	-1.450	-0.344
	(0.926)	(0.912)	(0.882)
ACN - Alianza Criminal del Norte	4.207^***	3.003^*	2.214
	(1.170)	(1.187)	(1.150)
La Oficina del Valle de Aburrá	9.126^***	1.916	1.458
	(1.317)	(1.355)	(1.321)
CONVIVIR	2.244^*	2.798^*	0.913
	(1.143)	(1.159)	(1.112)
Other Groups	1.440	1.290	-0.048
	(1.005)	(1.020)	(0.993)
Triana	0.988	-1.231	0.007
	(0.908)	(0.982)	(0.860)
AIC	30354.404	14533.490	15389.079
BIC	30425.135	14600.218	15457.047
Log Likelihood	-15166.202	-7255.745	-7683.539
Num. obs.	4583	3185	3565
Num. groups: COMUNA	16	15	16
Var: COMUNA (Intercept)	1.335	1.520	1.455
Var: Residual	43.674	5.499	4.299
p < 0.001; p < 0.01; p < 0.05

Scatter plot

# grubbs.test(merged$ipm_sd[merged$ipm_sd < 33], type = 10) # Outlier

# Create a scatter plot of tree_density vs avg_canopy
scatter_plot = merged@data %>% filter(ipm_sd < 33)

scatter_plot = ggplot(scatter_plot, aes(x = ipm_sd, y = murders)) + 
    geom_point() + 
    stat_smooth() +
  ylab("Homicides") + xlab("Inequality") +
  theme_minimal()

scatter_plot

Conclusions

This brief analysis has explored how violence is spread throughout Medellín, and how it might be explained. Homicides have dropped dramatically in the last decade yet remain troubling, being particularly concentrated among certain demographics (especially young males) as well as in certain areas of the city. This analysis cannot give any straight-forward explanations to these observations, however it appears that violence remains concentrated in the areas of the city that have historically been (and often continues to be) occupied by armed groups.

Despite its good reputation, the expansion of infrastructure projects in these areas does not appear to have to had a direct impact on homicide rates. However this may had an indirect effect by alleviating poverty and inequality, which appear to be other important explanations for why homicides concentrate in certain areas. In order to make conclusions with more certainty, more data would likely have to be collected and statistical analyses carried out (multilevel models, or spatial analysis methods may be particularly useful).

See data collection section above for details.↩
Municipality of Medellín: https://www.medellin.gov.co/irj/go/km/docs/wpccontent/Sites/Subportal%20del%20Ciudadano/Plan%20de%20Desarrollo/Secciones/Informaci%C3%B3n%20General/Documentos/POT/medellinPoblacion.pdf ↩
United Nations Office of Drugs and Crime: https://dataunodc.un.org/crime/intentional-homicide-victims ↩
Municipality of Medellín’s open data repository, at: https://geomedellin-m-medellin.opendata.arcgis.com/datasets ↩
For further details, see: Sampaio, 2019: The role of power for non-state armed groups in cities. Url: https://doi.org/10.1080/23802014.2019.1669487 ↩
Coordinación Colombia Europa Estados Unidos: Presencia de grupos paramilitares y algunas de sus dinámicas en Antioquia, 2017, p:78. Corporación Jurídica Libertad, Fundación Sumapaz".↩
For further information, visit DANE’s website, url: https://www.dane.gov.co/index.php/en/statistics-by-topic-1/poverty-and-life-conditions/pobreza-y-desigualdad ↩
For further details, see: Sampaio, 2019: The role of power for non-state armed groups in cities. Url: https://doi.org/10.1080/23802014.2019.1669487 ↩
See Brown and Langer, 2010: https://doi.org/10.1080/14678800903553837 ↩

Mapping Violence in Medellín: an exploratory spatial analysis of a post-conflict city in transformation

Simon Saldner - ssaldner@gmail.com

07-06-2020