In this analysis, I will be generating a new variable construct that will be included in my final model of the HIV risk environment for IDUs in Chicago. This variable construct is the count of IDU-related arrests per Chicago Community Area in 2016. The specific arrests that I am interested in include arrests for crimes characterized as “possession of hypodermic needle,” “sale/delivery of hypodermic needle,” and “possession of drug equipment.” Research has indicated that laws and policing initiatives can heighten HIV risk among IDUs by impacting syringe exchange use and accessibility and potentially causing mixing between IDUs’ networks and groups (Burris et al. 2004; Rhodes et al. 2009). Thus, on-the-ground implementation of laws prohibiting the possession of drug paraphernalia and high arrest rates have been shown to decrease the carrying of syringes and equipment by IDUs, increasing sharing opportunities (Burris et al. 2004; Rhodes et al. 2009). By generating this IDU-related arrest count variable, I hope to gain a better understanding of the HIV risk landscape in the Chicago area.
The data that I will be using to construct the IDU-related arrest count variable includes “Crimes - 2016” from the Chicago Data Portal and “Boundaries - Community Areas(current)” also from the Chicago Data Portal. The “Crimes - 2016” dataset includes reported incidents of crime that took place in Chicago in 2016 from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis Reporting) system. Important pieces of this dataset include the year of the crime (2016), the IUCR or Illinois Crime Reporting code, a description of this code and “Primary Type,” arrest information, and longitude and latitude coordinates of crimes. This dataset can be found here. The “Boundaries - Community Areas(current)” is spatial data consisting of current community area boundaries in Chicago and can be downloaded as a shapefile here.
To generate the new variable construct: count of IDU-related arrests per Chicago Community Area in 2016, I EXTRACT the csv data file of Chicago Crime from 2016 and Chicago community area boundaries shapefile, TRANSFORM these datasets through geoprocessing, and LOAD a cleaned IDU-related crime data csv, a IDU-related crime points shapefile, and a shapefile including the arrest count per community area. This process ultimately enables me to map this count variable by producing a thematic map. The steps of my ETL workflow can be seen below.
In order to begin my analysis, I first loaded all the libraries that will be used throughout this spatial analysis process as seen below.
library(sf)
library(tmap)
library(leaflet)
library(data.table)
library(tidyverse)
library(tidyr)
library(dplyr)
Here, I am reading in the crime csv so I can bring it into my R environment.
ChicagoCrime<-fread("Crimes_-_2016.csv", header = T)
Before beginning any analysis, I need to inspect the dataset so I can get an idea of what I am working with and the understand what steps are needed to clean/wrangle the data.
glimpse(ChicagoCrime)
## Rows: 269,534
## Columns: 22
## $ ID <int> 11645836, 11043021, 11243066, 11243020, 112279…
## $ `Case Number` <chr> "JC212333", "JA367631", "JB168427", "HZ184094"…
## $ Date <chr> "05/01/2016 12:25:00 AM", "10/19/2016 07:00:00…
## $ Block <chr> "055XX S ROCKWELL ST", "075XX S YATES BLVD", "…
## $ IUCR <chr> "1153", "0610", "1153", "0281", "1154", "2820"…
## $ `Primary Type` <chr> "DECEPTIVE PRACTICE", "BURGLARY", "DECEPTIVE P…
## $ Description <chr> "FINANCIAL IDENTITY THEFT OVER $ 300", "FORCIB…
## $ `Location Description` <chr> "", "RESTAURANT", "OTHER", "RESIDENCE PORCH/HA…
## $ Arrest <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ Domestic <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ Beat <int> 824, 421, 332, 1712, 513, 2221, 311, 1831, 151…
## $ District <int> 8, 4, 3, 17, 5, 22, 3, 18, 15, 7, 14, 10, 17, …
## $ Ward <int> 15, 7, 5, 39, 9, 19, 20, 42, 29, 6, 32, 12, 30…
## $ `Community Area` <int> 63, 43, 43, 13, 49, 72, 40, 8, 25, 69, 22, 30,…
## $ `FBI Code` <chr> "11", "05", "11", "02", "11", "26", "11", "11"…
## $ `X Coordinate` <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Y Coordinate` <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Year <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
## $ `Updated On` <chr> "04/06/2019 04:04:43 PM", "08/05/2017 03:50:08…
## $ Latitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Longitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Location <chr> "", "", "", "", "", "", "", "", "", "", "", ""…
dim(ChicagoCrime)
## [1] 269534 22
nrow(ChicagoCrime)
## [1] 269534
ncol(ChicagoCrime)
## [1] 22
Here, I can see that this dataset contains important information on the type of crimes and the the IUCR code used to identify them. Although the values being show in this output chunk display NA values for some latitude and longitude coordinates, there are not NA values for the crimes I am interested in analyzing in this model. I can also see here that this dataset has 268,308 rows and 22 columns.
Now that I have identified what IUCR codes to include in my variable construct, I now want to make sure that the crime data is a data frame so I can begin to clean/subset.
ChicagoCrime.df<-as.data.frame(ChicagoCrime)
To begin to subset and clean the data, I first want to filter for the IDU-related crimes - those with IUCR codes 2110, 2111, 2170.
IDUCrimes = filter(ChicagoCrime.df, IUCR %in% c("2111","2110","2170"))
glimpse(IDUCrimes)
## Rows: 183
## Columns: 22
## $ ID <int> 10365037, 10365454, 10365881, 10373041, 103826…
## $ `Case Number` <chr> "HZ100560", "HZ101106", "HZ101788", "HZ109136"…
## $ Date <chr> "01/01/2016 12:35:00 PM", "01/01/2016 10:00:00…
## $ Block <chr> "040XX W WILCOX ST", "064XX S ASHLAND AVE", "0…
## $ IUCR <chr> "2170", "2170", "2170", "2170", "2170", "2170"…
## $ `Primary Type` <chr> "NARCOTICS", "NARCOTICS", "NARCOTICS", "NARCOT…
## $ Description <chr> "POSSESSION OF DRUG EQUIPMENT", "POSSESSION OF…
## $ `Location Description` <chr> "ALLEY", "STREET", "STREET", "ALLEY", "POLICE …
## $ Arrest <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
## $ Domestic <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ Beat <int> 1115, 725, 2531, 1922, 434, 815, 1421, 112, 25…
## $ District <int> 11, 7, 25, 19, 4, 8, 14, 1, 25, 8, 8, 15, 4, 1…
## $ Ward <int> 28, 17, 29, 47, 10, 23, 35, 42, 37, 16, 17, 28…
## $ `Community Area` <int> 26, 67, 25, 6, 51, 56, 22, 32, 25, 66, 66, 25,…
## $ `FBI Code` <chr> "18", "18", "18", "18", "18", "18", "18", "18"…
## $ `X Coordinate` <int> 1149482, 1166774, 1137152, 1163818, 1192874, 1…
## $ `Y Coordinate` <int> 1899030, 1862112, 1911278, 1925275, 1837123, 1…
## $ Year <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
## $ `Updated On` <chr> "02/10/2018 03:50:01 PM", "02/10/2018 03:50:01…
## $ Latitude <dbl> 41.87886, 41.77720, 41.91270, 41.95059, 41.708…
## $ Longitude <dbl> -87.72659, -87.66416, -87.77157, -87.67321, -8…
## $ Location <chr> "(41.878858346, -87.726594936)", "(41.77719854…
I also want to filter these IDU-related crimes to only include crimes where arrests were made since I am specifically interested in how law enforcement and policing affects IDUs’ HIV risk environment especially in relation to the danger of being caught carrying a syringe or drug-injection equipment.
IDU_arrests = filter(IDUCrimes, Arrest %in% c("TRUE"))
glimpse(IDU_arrests)
## Rows: 183
## Columns: 22
## $ ID <int> 10365037, 10365454, 10365881, 10373041, 103826…
## $ `Case Number` <chr> "HZ100560", "HZ101106", "HZ101788", "HZ109136"…
## $ Date <chr> "01/01/2016 12:35:00 PM", "01/01/2016 10:00:00…
## $ Block <chr> "040XX W WILCOX ST", "064XX S ASHLAND AVE", "0…
## $ IUCR <chr> "2170", "2170", "2170", "2170", "2170", "2170"…
## $ `Primary Type` <chr> "NARCOTICS", "NARCOTICS", "NARCOTICS", "NARCOT…
## $ Description <chr> "POSSESSION OF DRUG EQUIPMENT", "POSSESSION OF…
## $ `Location Description` <chr> "ALLEY", "STREET", "STREET", "ALLEY", "POLICE …
## $ Arrest <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
## $ Domestic <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ Beat <int> 1115, 725, 2531, 1922, 434, 815, 1421, 112, 25…
## $ District <int> 11, 7, 25, 19, 4, 8, 14, 1, 25, 8, 8, 15, 4, 1…
## $ Ward <int> 28, 17, 29, 47, 10, 23, 35, 42, 37, 16, 17, 28…
## $ `Community Area` <int> 26, 67, 25, 6, 51, 56, 22, 32, 25, 66, 66, 25,…
## $ `FBI Code` <chr> "18", "18", "18", "18", "18", "18", "18", "18"…
## $ `X Coordinate` <int> 1149482, 1166774, 1137152, 1163818, 1192874, 1…
## $ `Y Coordinate` <int> 1899030, 1862112, 1911278, 1925275, 1837123, 1…
## $ Year <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
## $ `Updated On` <chr> "02/10/2018 03:50:01 PM", "02/10/2018 03:50:01…
## $ Latitude <dbl> 41.87886, 41.77720, 41.91270, 41.95059, 41.708…
## $ Longitude <dbl> -87.72659, -87.66416, -87.77157, -87.67321, -8…
## $ Location <chr> "(41.878858346, -87.726594936)", "(41.77719854…
Next, I want to clean up this data because there are a few columns with data that is not necessary for my analysis.
IDU_arrests = dplyr::select(IDU_arrests, IUCR, Latitude, Longitude)
glimpse(IDU_arrests)
## Rows: 183
## Columns: 3
## $ IUCR <chr> "2170", "2170", "2170", "2170", "2170", "2170", "2170", "21…
## $ Latitude <dbl> 41.87886, 41.77720, 41.91270, 41.95059, 41.70803, 41.80694,…
## $ Longitude <dbl> -87.72659, -87.66416, -87.77157, -87.67321, -87.56929, -87.…
Next, I want to read in the Chicago community area boundaries shp.
ChiAreas <-st_read("Boundaries - Community Areas (current) (1)")
## Reading layer `geo_export_f808c167-2d30-4df4-9de6-288e921ef41f' from data source `/Users/brifadden/Desktop/IDU_project/Boundaries - Community Areas (current) (1)' using driver `ESRI Shapefile'
## Simple feature collection with 77 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## geographic CRS: WGS84(DD)
glimpse(ChiAreas)
## Rows: 77
## Columns: 10
## $ area <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ area_num_1 <chr> "35", "36", "37", "38", "39", "4", "40", "41", "42", "1", …
## $ area_numbe <chr> "35", "36", "37", "38", "39", "4", "40", "41", "42", "1", …
## $ comarea <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ comarea_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ community <chr> "DOUGLAS", "OAKLAND", "FULLER PARK", "GRAND BOULEVARD", "K…
## $ perimeter <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ shape_area <dbl> 46004621, 16913961, 19916705, 48492503, 29071742, 71352328…
## $ shape_len <dbl> 31027.05, 19565.51, 25339.09, 28196.84, 23325.17, 36624.60…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-87.60914 4..., MULTIPOLYGON …